Cost, FinOps & model selection
“What does it cost and what's the ROI?” — have the answer ready.
Execs will ask what AI costs and what it returns. Understanding token economics, right-sizing models, and computing a unit cost turns you from enthusiast into a credible owner of the budget.
Key ideas
- 1
Token economics: you pay per input + output token. Long context, verbose outputs and chatty agents are the usual cost drivers.
- 2
Right-size the model: use the smallest model that passes your evals; reserve frontier models for the hard steps. Cascade (cheap first, escalate if needed).
- 3
Cut cost without losing quality: prompt caching, retrieval instead of stuffing context, shorter outputs, batching, and caching repeated answers.
- 4
Compute a unit metric: cost per resolved ticket / generated PR / answered question — this is what execs and ROI cases need.
- 5
Watch hosting trade-offs: API vs self-host vs EU-region managed (Bedrock/Azure OpenAI) balances cost, control and data residency.
Where the money goes
- Oversized models for easy tasks; frontier where a small model would pass evals.
- Bloated context (stuffing instead of retrieving) and long, unbounded outputs.
- Agents looping without step/cost caps.
- No caching of stable prompts or repeated queries.
Build the ROI case
- Define a unit of value (a ticket, a PR, a case) and measure cost per unit.
- Compare to the human/time cost it replaces or accelerates — conservatively.
- Set budgets and alerts at the gateway; track cost alongside quality.
Watch
Do the work
0/4 · 0%Test yourself
What's the disciplined approach to model selection?
27 chapters · progress saves automatically