Meter beats calculator
Calculators predict a best case you will rarely hit. A meter reads the truth off live telemetry. Why I built the meter instead of another spreadsheet.
I kept getting asked the same question in different costumes: what does it cost to serve this model? Every answer started with a spreadsheet, and every spreadsheet started with a lie — an assumed utilization number, typed in by a human who wanted the result to look good.
So I stopped building calculators and built a meter.
A calculator predicts. A meter observes.
A calculator takes your assumptions and multiplies them. Garbage in, confident
garbage out. It will happily tell you a model costs USD 0.40 per million
tokens because you told it the GPU runs at 90% utilization — a figure you have
no way to defend.
A meter does the opposite. It watches the running server and reports what is actually happening: request rate, time-to-first-token, time-per-output-token, batch occupancy, KV cache pressure. From those, the effective cost falls out as a measurement, not a guess.
Meter beats calculator. Not because the math is fancier — because it refuses to assume the one number that matters most.
What the meter reads
vllm-cost-meter is a read-only observer. It never touches the inference path;
it ingests Prometheus metrics a vLLM server already emits and turns them into a
live cost-per-million-token readout:
- throughput and request rate
- TTFT, TPOT, and end-to-end latency
- prompt and generation lengths
- batch state and KV cache utilization
Near idle, I have watched the effective cost climb to 36.3× the saturated figure on the same hardware. No calculator will ever show you that number, because no one types "we run at 3% utilization at 4am" into a spreadsheet.
The bias I trust
I like tools that tell operators the truth, even when the truth is unflattering. A meter is harder to fool than a calculator because it is reporting reality instead of negotiating with it. That is the whole design philosophy: build the meter, not another calculator.