GPUCOSTWISE
Dataset 2026-06 · 12 GPUs · 26 live offers

Know your $/1M tokens before you rent the GPU.

Blog benchmarks go stale the month they're published. GPU CostWise keeps a living price/perf dataset — H100, B200, MI355X and friends — and computes what your model actually costs to serve.

Cost calculator

Free, no signup. Pick a model, set your volume, get a ranked stack.

Workload spec
Quantization
Latency need

Set your workload and hit COMPUTE COST to rank 26 GPU configs by $/1M tokens.

How the numbers are computed

01 · Fit
GPUs needed = weights (total params × quant bytes) + KV cache for your batch, against 92% of usable VRAM. Tensor-parallel up to 8 GPUs; PCIe cards pay a bigger interconnect tax than NVLink/IF.
02 · Throughput
Decode speed ≈ memory bandwidth × 55% MBU ÷ active-param bytes. Batched throughput is capped by the compute roofline (dense TFLOPS × 45% MFU). MoE models use active params, so they rank realistically.
03 · Cost
$/1M tokens = (hourly price × GPUs) ÷ (tok/s × 3600) × 10⁶. Every price entry is stamped with its verification month — no silent staleness.

FAQ

How accurate are the throughput numbers?

They come from a memory-bandwidth roofline model (55% MBU, 45% MFU compute ceiling) — the same first-principles method serving teams use for capacity planning. Absolute numbers are estimates; the relative ranking between GPUs is what you should trust, and that is robust.

Where do prices come from?

Public on-demand pricing from Lambda, RunPod, Vast.ai, AWS, GCP, Vultr and TensorWave, hand-verified and stamped per entry. Current dataset: 2026-06, 26 offers across 12 GPUs.

Does it handle MoE models?

Yes. VRAM is sized from total parameters, throughput from active parameters — so DeepSeek V3 or Qwen3-235B rank realistically instead of looking impossibly expensive.

Is the calculator free?

The single-scenario calculator is free forever, no signup. Pro (₹999/$15 per month) adds saved scenarios, a monthly price-change feed for your stack, and shareable cost reports.