0911 min read · 150 XP

Cost, FinOps & model selection

“What does it cost and what's the ROI?” — have the answer ready.

Execs will ask what AI costs and what it returns. Understanding token economics, right-sizing models, and computing a unit cost turns you from enthusiast into a credible owner of the budget.

Key ideas

1
Token economics: you pay per input + output token. Long context, verbose outputs and chatty agents are the usual cost drivers.
2
Right-size the model: use the smallest model that passes your evals; reserve frontier models for the hard steps. Cascade (cheap first, escalate if needed).
3
Cut cost without losing quality: prompt caching, retrieval instead of stuffing context, shorter outputs, batching, and caching repeated answers.
4
Compute a unit metric: cost per resolved ticket / generated PR / answered question — this is what execs and ROI cases need.
5
Watch hosting trade-offs: API vs self-host vs EU-region managed (Bedrock/Azure OpenAI) balances cost, control and data residency.

Where the money goes

Oversized models for easy tasks; frontier where a small model would pass evals.
Bloated context (stuffing instead of retrieving) and long, unbounded outputs.
Agents looping without step/cost caps.
No caching of stable prompts or repeated queries.

Build the ROI case

Define a unit of value (a ticket, a PR, a case) and measure cost per unit.
Compare to the human/time cost it replaces or accelerates — conservatively.
Set budgets and alerts at the gateway; track cost alongside quality.

Watch

▶Reducing LLM cost: token & model strategiesFind it on YouTube →▶Choosing the right model (small vs frontier)Find it on YouTube →

Do the work

0/4 · 0%

Test yourself

Question 1 / 3

What's the disciplined approach to model selection?

27 chapters · progress saves automatically