ZK prover farm: where the money leaks

A prover farm is the most expensive infrastructure in a ZK stack, and also the most under-counted. A team buys or rents a stack of GPUs, runs SP1 or RISC Zero on them, sees the bill, and cannot understand why cost-per-proof is several times higher than the back-of-envelope number. We run prover infrastructure for clients and see that the money leaks where you do not expect. Below is where.

Cost per proof, not cost per GPU

The first mistake is the same as in LLM inference: costing the hardware rather than the useful work. A GPU costs a fixed sum per month, and the value is the proof it produced. Cost-per-proof is the farm's monthly cost divided by the number of proofs it actually emitted, and almost everything interesting lives in the denominator.

Four things kill the denominator: an empty queue, an overflowing queue, memory shortfall, and idle time between bursts. The per-GPU-hour price is secondary here: you can buy the cheapest cards and get an expensive proof if the farm sits idle half the time.

The queue: empty and full both hurt

Proofs do not arrive as a smooth stream but in bursts tied to the rollup's rhythm or to proving deadlines. So the farm constantly oscillates between two bad states.

An empty queue is paid GPUs sitting idle between bursts. A direct loss in cost-per-proof: the card payment spreads across fewer proofs.

A full queue is worse: proofs pile up, proving latency rises, and if the rollup has a finalization deadline you hit it. Then the choice is between "add cards for the peak" and "miss the deadline," and both are expensive.

Managing that oscillation is half the work on a prover farm. Sizing the farm for the average flow gives a cheap proof and failures at peaks; sizing for the peak gives reliability and a pile of idle cards in the troughs. The real economics live in a hybrid: a base on metal for the average flow, topped up in cloud for the peak.

Memory: the quiet ceiling

A ZK proof is memory-hungry, and memory hits the ceiling before compute does. On large circuits the witness and intermediate data do not fit in VRAM, and the proof either fails or moves to a CPU fallback that is tens of times slower. From a cost-per-proof view a slow fallback is a card occupied many times longer per proof: expensive and invisible until you look at the distribution of times.

So we cost the farm layout not just by card count but by the memory profile of the specific circuit: how much VRAM the worst proof eats, how many parallel proofs fit on a card, where the fallback begins. That drives real throughput far more than the flops in a spec sheet.

Utilization and card availability

Beyond the queue and memory, two things grind cost-per-proof down. The first is utilization: the share of GPU-time that went into actual proofs rather than data loading, synchronization, and pipeline idle. The second is the availability of the right cards: prover loads gravitate to recent GPUs, those are scarce, and the lead time on metal can exceed the window the farm is even needed for.

This gives the same hybrid conclusion as inference: a flat baseline load is cheaper on your own metal at high utilization, while peaks and short campaigns are topped up in cloud, overpaying per hour for availability and elasticity.

What it looks like for a client

On the polygon we run the client's circuits against a real load profile: we measure cost-per-proof, the memory profile, and queue behavior under a burst, and cost the farm layout from those numbers. From there it becomes operation: how many cards in the base, how we top up for the peak, which alerts watch queue depth, proof latency, and the memory fallback.

If you need a prover farm with cost-per-proof worked out rather than "whatever it came to," that is what we run in zk and cover through scale. Want to estimate cost-per-proof for your circuits and flow: get in touch.