You built the AI system. It works. The demo went great, leadership is excited, and the team shipped it to production. Now the invoices start arriving.
This is where most companies get blindsided. They budgeted for the build but not for the run. And running custom AI software costs real money every single month... sometimes more than the build itself over a two-year window.
According to CloudZero's 2025 State of AI Costs report, the average monthly spend on AI rose to $85,521 in 2025, a 36% jump from the previous year. And 45% of organizations now spend over $100,000 per month on AI tools alone. Those numbers include off-the-shelf SaaS, but custom software carries its own distinct cost profile that's harder to predict and easier to let spiral.
This post breaks down the five recurring cost categories you need to budget for, with real numbers and practical strategies for keeping them under control.
1. LLM API Costs: The Biggest Variable
If your custom AI software calls foundation models from OpenAI, Anthropic, or Google, API fees will likely be your largest ongoing line item. And unlike traditional SaaS with fixed monthly pricing, LLM costs scale directly with usage... every request burns tokens, and tokens cost money.
Here's what the current pricing landscape looks like for the major providers:
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Best For |
|---|---|---|---|
| Claude Haiku 4.5 | $1.00 | $5.00 | Classification, routing, simple tasks |
| Claude Sonnet 4.6 | $3.00 | $15.00 | General workhorse, analysis, content |
| Claude Opus 4.6 | $5.00 | $25.00 | Complex reasoning, code generation |
| GPT-5.2 | $1.75 | $14.00 | General purpose, multimodal |
| GPT-5 mini | $0.25 | $2.00 | High-volume, cost-sensitive tasks |
The good news: by most estimates, LLM prices dropped roughly 80% from 2025 to 2026 across the industry. The bad news: usage tends to grow faster than prices fall. A system handling 500 requests per day at 10,000 tokens per request is burning 5 million tokens daily. Even at $3 per million input tokens, that adds up fast when you factor in output tokens, retries, and context window overhead.
For a mid-market deployment, expect $500 to $5,000 per month in LLM API costs for moderate usage. High-volume agent systems or multi-step reasoning workflows can easily push past $10,000 per month.
How to Control API Costs
The single highest-leverage move is model orchestration. Instead of routing every request to your most expensive model, build a tiered system: lightweight models for classification and simple lookups, mid-tier models for general work, and premium models only for tasks that genuinely need them. We've seen this cut token costs by 40-70%.
Beyond orchestration:
- Prompt caching. Both OpenAI and Anthropic offer 90% discounts on cached prompt reads. If your system sends similar prompts repeatedly, caching is free money.
- Batch APIs. For workloads that don't need real-time responses, batch processing runs at a 50% discount with every major provider.
- Context pruning. Load only the data each request actually needs. Sending your entire knowledge base as context on every call is a token bonfire.
- Pre-built scripts for deterministic tasks. If a request is just a database lookup or API call, don't burn tokens on it. A shell script does it faster and cheaper.
2. Cloud Infrastructure: The Quiet Escalator
Your AI system runs on infrastructure... compute, storage, networking, databases. These costs are easier to predict than API fees, but they have a way of creeping upward as usage grows and nobody's watching.
Typical infrastructure components for a custom AI deployment:
- Application hosting (Vercel, AWS, GCP): $200-$2,000/month depending on traffic and compute needs
- Vector database (Pinecone, Weaviate, pgvector): $100-$1,000/month for RAG-based systems
- Object storage (S3, GCS) for documents, embeddings, logs: $50-$500/month
- Queue/orchestration services (SQS, Cloud Tasks): $50-$200/month
- Monitoring and observability (Datadog, CloudWatch): $100-$500/month
For most mid-market custom AI systems that use hosted LLM APIs (not self-hosted models), infrastructure costs run $1,000 to $5,000 per month. That's manageable. Where it gets expensive is self-hosting models on GPU instances... a single A100 GPU on AWS runs $15,000-$25,000 per month for continuous operation, according to Google Cloud's AI/ML cost optimization framework.
Our advice: unless you have a specific compliance or latency requirement, use hosted APIs. The cost of self-hosting GPU infrastructure almost never pencils out for mid-market companies.
3. Maintenance Engineering: The Cost That Pays for Itself
This is the line item companies most often skip, and it's the one that hurts the most when they do. AI systems aren't like traditional software... they degrade even when you don't touch them. Models drift, upstream data changes, API providers ship breaking updates, and edge cases accumulate.
McKinsey's 2025 State of AI survey found that 88% of companies now use AI regularly, but only one-third have begun to scale their programs at the enterprise level. A key barrier: the operational investment needed to keep AI systems performing after launch. Companies budget for the build but not for the run.
Ongoing maintenance engineering typically covers:
- Prompt tuning and optimization as usage patterns evolve and edge cases surface
- Model version management when providers deprecate old models or change behavior in new releases
- Data pipeline maintenance as upstream sources change schemas, add fields, or alter formats
- Bug fixes and incident response for AI-specific failure modes (hallucinations, quality regressions, latency spikes)
- Security patching for prompt injection defenses, dependency updates, and compliance requirements
Budget $5,000 to $15,000 per month for a dedicated maintenance allocation, depending on system complexity. That typically translates to 20-40 hours of senior engineering time per month. If that sounds like a lot, consider the alternative: a system that silently degrades until someone rebuilds it from scratch.
For a deeper dive on what this support should include, see our post on what ongoing support to expect from an AI development partner.
4. Model Monitoring and Evaluation: The Early Warning System
You can't manage what you can't measure. And AI systems fail in ways that don't throw errors... they just quietly produce worse outputs until someone notices.
A proper monitoring setup tracks:
- Output quality scores against evaluation datasets that grow over time
- Token usage and cost per task type to catch inefficiencies before they compound
- Latency distributions (P50, P95, P99) to detect slowdowns
- Error and fallback rates to identify when the system is struggling
- User feedback signals to calibrate quality perception against automated metrics
Deloitte's 2025 Tech Value Survey found that only 51% of organizations have allocated dedicated budget for AI initiatives. The other half is flying blind... spending on AI without structured investment planning to know whether their systems are getting better or worse. Monitoring is how you close that gap.
The cost here is mostly tooling and the engineering time to maintain evaluation pipelines. Budget $500 to $2,000 per month for monitoring infrastructure (tools like Langfuse, Helicone, or custom dashboards), plus the engineering time already accounted for in your maintenance allocation.
This is not optional. It's the difference between managing your AI investment and hoping it works.
5. The Hidden Costs Nobody Warns You About
Beyond the four major categories, there are recurring costs that catch teams off guard:
Embedding Refresh Costs
If your system uses RAG (retrieval-augmented generation), your vector embeddings need refreshing when source content changes. For a knowledge base with thousands of documents updating weekly, re-embedding costs $100-$500 per month. Not catastrophic, but not zero.
Compliance and Audit Overhead
As AI regulations mature (EU AI Act, state-level laws in the US), compliance requirements grow. Documentation, audit trails, bias testing, and data handling reviews all take time and money. Budget at least a few hours per month for compliance maintenance, more if you're in a regulated industry.
Training and Change Management
Your team needs to know how to use the AI system effectively and how to escalate when it fails. As the system evolves, training needs refresh. This is typically $500-$2,000 per quarter in internal time.
Provider Lock-in Switching Costs
If you've built your entire system around one LLM provider's specific API, switching to a cheaper or better alternative means rewriting integration code and re-tuning prompts. This isn't a monthly cost, but it's a risk that should inform your architecture decisions upfront. Build abstraction layers. Avoiding vendor lock-in saves money long-term.
Putting It All Together: A Realistic Monthly Budget
Here's what a typical mid-market custom AI deployment actually costs to run, broken down by category:
| Cost Category | Low End | Mid Range | High End |
|---|---|---|---|
| LLM API fees | $500/mo | $3,000/mo | $10,000+/mo |
| Cloud infrastructure | $1,000/mo | $3,000/mo | $5,000+/mo |
| Maintenance engineering | $5,000/mo | $10,000/mo | $15,000+/mo |
| Monitoring and evaluation | $500/mo | $1,000/mo | $2,000+/mo |
| Hidden costs (compliance, training, embeddings) | $500/mo | $1,500/mo | $3,000+/mo |
| Total | $7,500/mo | $18,500/mo | $35,000+/mo |
That's $90,000 to $420,000+ per year in ongoing operational costs. For context, if your initial build cost $200,000-$400,000, you're looking at annual run costs of roughly 25-100% of the build cost. This is consistent with Gartner's 2026 projection that worldwide AI spending will hit $2.52 trillion, with operational and infrastructure costs growing even faster than initial development investment.
Is that a lot? Depends on the value it delivers. If your AI system saves $500,000 per year in operational efficiency or generates $1 million in additional revenue, a $200,000 annual run cost is an easy yes. If you can't quantify the value... that's a different conversation. (We wrote about estimating ROI for AI workflows if you need help with that math.)
How We Think About Ongoing Costs at Last Rev
We build cost awareness into the architecture from day one. Not as an afterthought. Not as a "phase 3" optimization. From the first design conversation.
Here's what that looks like in practice:
- Model orchestration by default. Every system we build uses tiered model routing. Simple tasks go to cheap models or scripts. Complex tasks get the premium models. This cuts API costs by 40-70% compared to single-model approaches.
- Cost projections before we write code. Before building, we model expected API consumption, infrastructure costs, and maintenance requirements. Our clients know what the monthly bill will look like before they commit to the build. For a full breakdown of realistic build and support costs, see our post on how much custom AI should realistically cost.
- Monitoring from launch. Every AI feature ships with cost tracking, quality monitoring, and usage analytics. No surprises. When token usage spikes, we know why and can respond before the invoice does.
- Abstraction layers for provider flexibility. We never hard-wire to a single LLM provider. When a cheaper or better model becomes available, switching is a configuration change, not a rewrite.
- Transparent support tiers. Our clients choose their maintenance level based on their needs and budget, with clear expectations for what each tier includes.
Key Takeaways
- Budget for the run, not just the build. Ongoing costs typically run 25-100% of your initial build cost annually. If you didn't plan for this, plan for it now.
- LLM API fees are your biggest variable. Model orchestration, prompt caching, and batch processing are your best levers for controlling them.
- Maintenance engineering isn't optional. AI systems degrade by default. Someone needs to be watching, tuning, and fixing them continuously.
- Monitoring pays for itself. The cost of observability tooling is trivial compared to the cost of a system that silently degrades for months.
- Architect for cost control from day one. Abstraction layers, tiered model routing, and context pruning are design decisions, not optimizations you bolt on later.
The companies that succeed with custom AI aren't the ones who spend the most. They're the ones who know exactly what they're spending, why, and what they're getting back. If you're planning an AI build or struggling with runaway costs on an existing system, let's talk about getting your numbers right.
Sources
- CloudZero -- "The State of AI Costs in 2025" (2025)
- Gartner -- "Worldwide AI Spending Will Total $2.5 Trillion in 2026" (2026)
- McKinsey -- "The State of AI: Global Survey 2025" (2025)
- Deloitte -- "AI Is Capturing the Digital Dollar: 2025 Tech Value Survey" (2025)
- Google Cloud -- "AI and ML: Cost Optimization" (2025)
- Finout -- "OpenAI vs Anthropic API Pricing Comparison" (2026)