The Hidden Cost of AI After Launch

The budget line that most AI projects do not have

Ask any enterprise team that has deployed AI in production for six months what surprised them most. The answer is almost never the model performance or the integration complexity. It is the cost and effort of keeping the system working after launch.

Token spend that was not modelled. Retrieval quality that degrades as knowledge ages. Prompt behaviour that drifts as the model changes. Incidents that require human triage. Governance reports that need to be produced monthly. A growing backlog of improvements that no one owns.

This whitepaper documents the post-launch cost categories that enterprise AI budgets consistently miss — and the operating model required to manage them.

Cost category 1: Model inference and token spend

Token costs are visible in theory and invisible in practice until the first bill arrives. The problem is not that tokens are expensive — it is that usage patterns in production are almost never what was modelled in development.

Development testing uses short, controlled prompts against a fixed dataset. Production users send longer, more varied inputs. Context windows expand as conversation history is included. RAG retrieval adds thousands of tokens per query. Multi-step agents call the model multiple times per workflow. The result is a token spend 3–8x higher than pre-launch estimates in the first 90 days.

Budget reality: In engagements we have reviewed, the median gap between projected and actual token spend at 90 days post-launch is 4.2x. Teams that did not model production usage patterns consistently overspent their quarterly AI budget in the first month.

Managing this requires model-level cost reporting (not just cloud-level), per-workflow token budgeting, and active prompt optimisation after launch — compressing context, caching repeated retrievals, and routing lower-complexity tasks to cheaper models.

Cost category 2: RAG and knowledge system decay

Retrieval-augmented generation systems retrieve answers from a knowledge base. That knowledge base ages. Policies change. Products are updated. Regulations evolve. Pricing shifts. The AI does not know any of this unless the knowledge base is actively maintained.

In practice, most teams launch a RAG system with high-quality knowledge at go-live and do not budget for ongoing maintenance. Six months later, the system is confidently answering with outdated information — and no one has noticed because the outputs still look plausible.

Knowledge system operations include: source freshness monitoring, failed query analysis (queries where retrieval returned poor results), citation validation, and a regular knowledge backlog process to add, update and retire content. This is not a one-time cost — it is a recurring operational responsibility.

Cost category 3: Prompt and model change management

Prompts are not static. In a live production system, prompts need to change regularly — to improve output quality, adapt to new use cases, respond to model updates from the provider, and fix edge cases that emerge from real usage.

Each prompt change is a risk. A prompt that improves performance on one task can degrade it on another. Without version control, rollback capability, and quality testing before release, prompt changes introduce unpredictable behaviour into a system that business users are depending on.

The operational requirement: Every prompt change in production must go through a release process: version logged, tested against a regression benchmark, approved, released with rollback available. This is not over-engineering — it is the minimum viable change management for a system making business decisions.

Model changes from providers add another dimension. When a model provider updates their base model (GPT-4o, Claude, Gemini), the behaviour of your prompts may change without any action on your side. Monitoring for output drift after provider updates is a post-launch operational responsibility that most teams do not have a process for.

Cost category 4: Quality review and incident management

AI systems produce bad outputs. In a well-designed system with human-in-the-loop review, most bad outputs are caught before they cause harm. But even with review, incidents happen: a misclassification that reached a customer, a retrieval failure that produced a confident wrong answer, an edge case that broke the workflow.

Each incident requires triage, root cause analysis, a fix (prompt, retrieval, or model), regression testing, and a post-incident report. The time cost per incident in teams we have worked with ranges from 4 to 16 hours depending on severity and how well the system is instrumented.

Teams that budget for zero incidents in their post-launch plan are not being optimistic — they are being unrealistic. Budget for two to four significant incidents per quarter in the first year, plus ongoing quality review of low-confidence outputs.

Cost category 5: Governance and compliance reporting

For enterprise AI systems operating in regulated contexts — HR, legal, financial, healthcare — governance is not optional. Leadership needs monthly visibility into how the AI is performing, what decisions it influenced, where humans overrode it, and what the error rate looks like.

Producing this reporting manually from logs is expensive and error-prone. Producing it automatically requires instrumentation that most teams do not build at launch. The result is that governance reporting either does not exist (compliance risk) or consumes significant analyst time every month (operational cost).

The operating model: what "Managed AI Operations" actually means

The costs above are not avoidable — they are the cost of operating AI responsibly in production. The question is who manages them and how.

Small teams with one or two AI products in production typically cannot justify a full internal AI operations function. The economics do not work: a dedicated AI ops engineer costs £80–120k per year and typically cannot cover the full range of monitoring, prompt ops, RAG quality, cost management, and governance that a production system requires.

The alternative is a managed operations model: an external partner who owns the post-deployment operating layer — monitoring, prompt and model change management, RAG quality, cost optimisation, incident response, and monthly reporting — against a predictable monthly retainer. The cost is typically 15–25% of the build cost per year, significantly less than an internal hire, and structured to improve the system continuously rather than just maintain it.

Building the post-launch budget

Before your next AI deployment, add these line items to the post-launch budget:

Token and model cost reserve: 3–5x your development-phase token estimate for the first quarter
Knowledge maintenance: 4–8 hours per month per knowledge domain
Prompt and model change management: 1–2 days per month for a technical owner
Quality review and incident budget: 2–4 incidents per quarter at 4–16 hours each
Governance reporting: Monthly report production, either automated (build cost) or manual (2–4 hours per month)

Key takeaways

Post-launch AI operations costs are consistently underestimated — often by a factor of 3–5x.
Token spend in production is 3–8x higher than development-phase estimates.
RAG knowledge systems decay without active maintenance — stale knowledge produces confident wrong answers.
Prompt changes in production require a formal change management process.
Budget for 2–4 significant incidents per quarter in the first year.
A managed operations model is often more cost-effective than building an internal AI ops function.