The pattern tends to look the same across companies of different sizes and sectors. The cloud bill was reasonable for the first year or so. Then it started growing. Not dramatically — just steadily, a few percent each month. Nobody raised it because it didn't seem urgent. Then someone ran a quarterly review and the number on the screen was significantly larger than anyone expected.

Cloud cost drift is ordinary. It's not a sign that something went badly wrong. It's mostly the result of a few predictable things happening over time without anyone specifically tracking them.

The usual sources of drift

The most common one is over-provisioned compute. When you're setting up infrastructure, especially under time pressure, there's a tendency to provision something larger than you need and plan to resize later. The resizing rarely happens. The instance that was "temporary" while performance was monitored ends up running for two years at a tier that's double what the workload requires.

Closely related: idle resources. Development environments provisioned for a project that shipped six months ago. Load balancers attached to services that were retired. Elastic IPs reserved and never assigned. These things add up slowly and consistently.

Data transfer and egress costs are the other category that surprises people. Cloud providers charge for data leaving their network, and the pricing is set at a level where it seems negligible until you're moving real volumes. Applications that pull data frequently from external sources, or that serve large files to users, can generate substantial egress bills that nobody thought to model.

What reserved instances and savings plans actually require

AWS reserved instances and Azure reserved VM instances offer meaningful discounts — typically 30 to 60 percent against on-demand pricing — in exchange for a one or three year commitment. The problem is that they need to be matched against actual usage to provide value. If you commit to a particular instance type and then change your workload, you're paying for capacity that doesn't match what you're running.

Savings plans are more flexible — you commit to a spend level rather than a specific resource — but they still require someone to be paying attention to whether the commitment is being used efficiently. Most companies that have been on cloud for more than two years have a coverage and utilisation story that's messier than they realise.

Where to start when the bill doesn't make sense

The first useful thing to do is not to start cutting. It's to understand what you're actually running and what it's doing.

AWS Cost Explorer and the Azure Cost Management tools provide a reasonable starting point for categorising spend by service, region, and tag. The catch is that a significant portion of resources in most environments are either untagged or tagged inconsistently, which makes the analysis incomplete.

After understanding the broad shape of spend, the productive sequence tends to be:

  • Identify and terminate idle resources (quick wins, low risk)
  • Review compute sizing against actual CPU and memory utilisation over the previous 30 days
  • Map data transfer costs to specific services or integration points
  • Assess reserved instance and savings plan coverage against current usage patterns

The last item is where most of the money tends to be, but it's also the one that requires the most care. Getting commitment coverage wrong in the other direction — over-committing — creates a different problem.

The constraint that makes ongoing cost control difficult

The structural challenge is that cost optimisation isn't a one-time project. Workloads change. Teams add services. Usage patterns shift. Without someone whose job it is to watch this continuously, the bill drifts again within a few months of whatever was cleaned up.

This is the reason cost visibility tends to be built into managed infrastructure engagements rather than treated as a separate workstream. When the same team managing the environment is also responsible for what it costs, the two things stay connected.

If nobody owns the cloud bill month to month, it will grow. Not because of anything dramatic — just because that's what happens when infrastructure accumulates changes without anyone tracking their cost implications.

The practical implication is that a cost audit is useful for resetting to a known baseline, but the baseline degrades over time unless someone maintains it. Worth factoring that into how you think about the effort.