Unicorne: Managing AWS Cost Optimisation in the Gen AI Era

Unicorne: Managing AWS Cost Optimisation in the Gen AI Era

Share this article
Share this article
Prioritise Us on Google
Generative AI has changed how organisations manage AWS spending, demanding new approaches to resource-level analysis

Despite cloud computing’s maturation, a significant portion of enterprise workloads remain on-premises. According to AWS, 75% of enterprise workloads remain on premises, with 70% of Fortune 500 companies still running on software written over two decades ago. 

Two concerns drive this hesitancy: unpredictable costs and security governance. Organisations that have spent decades managing physical infrastructure understand their capital expenditure models and the shift to operational expenditure, where costs can scale unexpectedly, represents a loss of control that leadership teams find uncomfortable. 

When data centres sit within your building, security boundaries feel tangible. Moving to the cloud requires trusting a shared responsibility model that can feel abstract.

Yet these concerns obscure a bigger risk: competitive stagnation.

The case for AWS migration

The data tells a clear story. Modernising on AWS enables streamlined operational practices that lead to measurable results: infrastructure costs drop by an average of 20% through elastic scaling and elimination of over-provisioned capacity, and administrator productivity increases by 66% as teams automate routine maintenance and patching.

Time-to-market improves dramatically. New features reach production 43% faster through rapid provisioning and deployment. Staff can redirect 29% more of their focus toward innovation rather than maintaining infrastructure. Security incidents decrease by 45% through AWS’s shared responsibility model and continuous updates.

But these benefits only materialise with proper cost governance. Without disciplined FinOps practices, the elasticity that makes cloud powerful transforms cost savings into budget overruns.

The cost crisis

AWS releases thousands of updates annually, each carrying cost implications that even experienced teams struggle to track. Database engines reaching end-of-support and triggering RDS Extended Support, S3 storage class mismatches as usage patterns shift, and increased application logging or metrics queries can seem individually minor but are collectively significant.

Modern applications run across distributed architectures where everything connects to everything else, and costs interact in non-linear ways. Load balancers, auto-scaling, cross-region transfers and detailed logging create interdependencies where optimising one component inadvertently increases costs elsewhere.

“With thousands of updates each year, even seasoned teams can’t keep up,” says Éric Pinet, CEO of Unicorne, a company that specialises in AWS transformations. Before founding Unicorne in 2018, Éric spent eight years managing development teams of up to 125 people. “Add to this the complexity of interdependencies and costs become impossible to predict without the right tools.”

Today, the company’s SaaS offering, Stable, constantly monitors enterprise AWS infrastructure from Lambda functions to ElastiCache clusters, providing real-time, smart alerts and savings recommendations.

A four-level framework for optimisation

Éric’s team developed a framework over years of managing infrastructure across multiple organisations. The approach prevents teams from jumping to complex architectural changes whilst overlooking easy wins that deliver immediate impact.

Level 1: Quick wins (days)

These require no downtime, architectural changes or code deployments. Reserved instances, savings plans and spots instances can reduce costs 30–70% for consistent workloads, while unused resource cleanup identifies orphaned snapshots, unattached EBS volumes, forgotten AMIs and idle load balancers. Storage class optimisation moves rarely accessed S3 data to cheaper tiers, and region cost arbitrage takes advantage of regional price differences for non-latency-sensitive workloads.

“We’ve seen clients cut 37% off their cloud bills in just three months by following this sequence,” Éric shares. “In one case, 60% of the savings came from the quick wins alone, implemented in the first week.”

Level 2: Low-hanging fruit (weeks)

Once quick wins are implemented and teams see tangible results, they’re ready for optimisations requiring brief maintenance windows or minor code changes. Right-sizing instances address over-provisioning, where utilisation analysis often reveals 40–60% idle capacity.

“We’ve seen immediate savings of 20% simply by switching from x86 to ARM-based Graviton2 processors,” Éric reports. Lambda memory optimisation requires testing but can produce dramatic results. “Right-sizing Lambda memory has produced reductions of up to 85%: one client saved $38,000 annually this way,” he says.

Level 3: Architectural changes (months)

With momentum established, teams can tackle optimisations requiring substantial engineering effort. Serverless migration eliminates idle capacity costs for appropriate workloads, while database engine changes can have major impact.

“Aurora Serverless often looks attractive but can cost up to seven times more than a dedicated RDS instance,” Éric describes. “By switching back, one client saved 75% – nearly $20,000 per year.”

Level 4: Strategic commitments (ongoing)

Enterprise discount programmes provide tiered discounts for organisations spending more than $500,000 annually. FinOps culture embeds cost awareness into engineering through real-time dashboards, tagging for accountability and budget alerts that make costs visible to everyone.

Why resource-level analysis matters

Most teams rely on AWS Cost Explorer to understand their spending. The tool shows service-level totals (RDS, S3, Lambda, CloudWatch), but these numbers only tell a partial story.

Éric uses one client as an example. “Their RDS costs had climbed to $12,000 a month,” he says. “A service-level report would simply say ‘RDS: $12,000’, which doesn't help you understand what to do.”

Resource-level analysis revealed the full picture. By breaking it down at the resource level, Unicorne discovered that 30% of the cost came from snapshots alone. “Their policy was taking hourly backups and storing them for 90 days,” Éric shares.

Hourly backups with 90-day retention for a development database was excessive. By moving to daily snapshots in development and more reasonable policies in production, the client saved more than $2,000 a month. Further investigation uncovered a synchronisation job transferring data unnecessarily between regions, saving another $900.

“In total, they reduced their bill by $3,800 each month: 32% of the original cost,” Éric says. “That kind of detail is invisible without resource-level visibility.”

How Stable operationalises this framework

Stable emerged from the company’s experience managing cloud infrastructure for clients. “By supporting clients through managed services, we gained first-hand knowledge of how to optimise cloud environments under real constraints,” Éric says. “Without that experience, Stable could never have existed. We don’t sell theory, we deliver proven solutions tested with real companies, on real budgets and with real business stakes.”

The platform provides resource-level AWS cost analysis with prioritised recommendations based on implementation effort and business impact. Stable focuses specifically on AWS, with particular attention to serverless architectures and AI workloads.

Features include transparency over automation – no ‘auto-fix’ buttons that create infrastructure drift – as well as actionable resource-level insights, rather than service-level aggregates, and optimisation recommendations prioritised by quick wins versus long-term architectural changes.

The Gen AI layer of complexity

Traditional infrastructure costs scale linearly: provision more capacity, pay proportionally more. AI workloads behave differently and the disconnect can catch even sophisticated teams off guard.

In a traditional setup, businesses can predict the monthly cost of an EC2 instance. With Gen AI, usage can scale unpredictably into billions of tokens, and expenses accumulate much faster than expected.

Most implementations resend the entire conversation history with every exchange to maintain context. Éric calls this “conversation creep”, where a 10-turn dialogue doesn’t cost 10 times a single exchange but something closer to 55 times (1+2+3+4...+10).

“Outbound tokens are several times more expensive than inbound ones, images cost more than text and audio can be even pricier,” Éric notes. “Teams miss the hidden costs of integration – vector databases, caching systems and security guardrails – that all scale directly with usage.”

Teams can also overlook the huge variations in pricing between models, where two models with apparently similar performance can vary in cost by a factor of 10. Anthropic Claude costs roughly 10 times more than Amazon Nova, for example, yet for many use cases the less expensive model performs adequately. Using less expensive models for routine queries while reserving premium models for complex reasoning allows organisations to balance cost and quality effectively.

Building FinOps maturity

When cloud costs spiral unexpectedly, the financial impact is only part of the problem. Teams become cautious about experimentation, leadership hesitates on new initiatives and development velocity decreases.

“The real damage isn’t just the money lost, it’s the loss of confidence,” Éric explains. “When a CTO hesitates to invest further in AI or cloud, the company risks falling behind whilst competitors keep innovating. The role of cost optimisation is not to slow down innovation but to ensure it’s sustainable.”

Rebuilding trust

To rebuild trust, the first step is to restore visibility with alerts and budget segmentation. This means real-time dashboards, alerts at 75% and 90% of monthly budget segmented by environment and resource-level tagging enforced through automation.

The second is to establish strong controls, such as quotas and mandatory tagging. Deploy budget quotas per environment, automatic shutdown of non-production environments after hours and approval workflows for large instance types.

The third is to demonstrate results quickly with easy wins. “Once the team sees savings in action, they regain the confidence to keep moving forward,” Éric says.

Governance as Code

Cost governance requires the same ‘as code’ approach that transformed infrastructure management. “Cost governance as code is becoming essential,” Éric observes. “It mirrors the philosophy of DevOps and Infrastructure as Code. Governance must be built into automated processes.”

That means enforcing tagging rules, blocking oversized instances in development, and shutting down non-production environments after working hours directly within automated deployment tools to ensure alerts are raised before they are even deployed to the infrastructure.

From crisis to competitive advantage

Executed properly, cloud cost optimisation helps teams transform from crisis to competitive advantage. Organisations that master this discipline achieve predictable budgets that enable confident investment in innovation. Engineering teams gain freedom to experiment without fear of runaway costs. Infrastructure scales efficiently with business growth rather than outpacing it.

The path forward requires three commitments: visibility through resource-level analysis that reveals exactly where money goes, governance through automated policies that prevent waste without blocking progress and culture through embedding cost awareness into daily engineering practice.

“My most important piece of advice is to treat cost optimisation the same way you treat security,” Éric says. “It is not a project with an end date. It is a continuous discipline that must be woven into the daily practices of every team.”

The organisations that succeed understand that cloud computing's promise – 20% cost savings, 43% faster time-to-market and 29% more focus on innovation – only materialises when paired with disciplined cost governance. They treat cloud cost management not as a constraint but as an enabler of sustained innovation velocity.

Learn more at stableapp.clou

AI & Tech
AI & Tech

Company portals

Executives