Skip to content
The AI Tech Lead Path
The path
0911 min read · 150 XP

Cost, FinOps & model selection

“What does it cost and what's the ROI?” — have the answer ready.

0%

Execs will ask what AI costs and what it returns. Understanding token economics, right-sizing models, and computing a unit cost turns you from enthusiast into a credible owner of the budget.

Key ideas

  1. 1

    Token economics: you pay per input + output token. Long context, verbose outputs and chatty agents are the usual cost drivers.

  2. 2

    Right-size the model: use the smallest model that passes your evals; reserve frontier models for the hard steps. Cascade (cheap first, escalate if needed).

  3. 3

    Cut cost without losing quality: prompt caching, retrieval instead of stuffing context, shorter outputs, batching, and caching repeated answers.

  4. 4

    Compute a unit metric: cost per resolved ticket / generated PR / answered question — this is what execs and ROI cases need.

  5. 5

    Watch hosting trade-offs: API vs self-host vs EU-region managed (Bedrock/Azure OpenAI) balances cost, control and data residency.

Where the money goes

  • Oversized models for easy tasks; frontier where a small model would pass evals.
  • Bloated context (stuffing instead of retrieving) and long, unbounded outputs.
  • Agents looping without step/cost caps.
  • No caching of stable prompts or repeated queries.

Build the ROI case

  • Define a unit of value (a ticket, a PR, a case) and measure cost per unit.
  • Compare to the human/time cost it replaces or accelerates — conservatively.
  • Set budgets and alerts at the gateway; track cost alongside quality.

Watch

Do the work

0/4 · 0%

Test yourself

Question 1 / 3

What's the disciplined approach to model selection?

27 chapters · progress saves automatically