> solutions / zk

Managed operations for ZK prover farms and rollups

Proof generation is GPU-bound, deadline-bound, parallel. Dropped jobs = missed blocks. We build the farm, the queue, the retries, and per-circuit benchmarks on SP1, RISC Zero, Boundless, Brevis.

We arrive with a working stack: SP1 and RISC Zero as the RISC-V zkVM baseline, Boundless and Brevis as decentralized prover marketplaces, Jolt and Halo2 for custom circuits. For each farm we wire up a GPU scheduler with deadline-aware prioritization, kv-cache for circuit segments, retry queues with exponential backoff, and missed-proof-window alerts.

The whitespace we sit on: the "operator for operators" on prover marketplaces. Succinct, Boundless and Brevis launched as marketplaces in late 2025; no one occupies the "we run your GPU farm on these marketplaces with a signed SLA" position. We take it.

> stack we operate

ZK subset. The platform layer is identical across ICPs.

ZK: SP1 RISC Zero Boundless Brevis Jolt Halo2

Platform: Kubernetes Terraform Ansible Prometheus Grafana Loki OpenTelemetry PagerDuty

> what we deploy

Concrete deliverables for ZK teams. Each ships end-to-end with repo, IaC and runbooks.

[ Prover farm with GPU scheduler ] →

H100 / A100 / 4090 pool, deadline-aware job prioritization, NUMA-aware placement, 75%+ per-card utilization in typical setups.

[ Proof-job retry queue ] →

Idempotent submit, exponential backoff, dead-letter queue with per-circuit analysis, alert on pre-deadline ETA miss.

[ Per-circuit benchmark catalogue ] →

Measurements on SP1 / RISC Zero / Jolt per circuit: time-to-prove, peak memory, optimal GPU type. Artifact cache for precompiles.

[ Operator on Boundless / Brevis / Succinct ] →

Marketplace registration, bidding strategy per circuit type, reputation tracking, payout reconciliation, marketplace switching by marginal cost.

[ ZK rollup ops: prover + sequencer + DA ] →

Proving stack for your own rollup: coordination across sequencer, batcher and provers, L1 finality monitoring, stall playbook.

> what we operate 24/7

After handoff the pager lives with us. Coverage tuned for ZK farms:

Prover liveness against network deadlines: alert if proof ETA exceeds 70% of the proof window.
GPU health watchdog: ECC errors, thermal throttling, xid signals trigger preempt and job migration.
Queue depth / lag metrics per circuit: alert on >20% drift off baseline.
Auto-retry on failed proof generation with deterministic seed for reproducibility.
Versioned runbooks: prover crash, OOM on witness generation, stuck marketplace bid, L1 fork-choice inside the proof window.
Monthly perf review: new precompiles, refreshed benchmarks, load rebalancing across marketplaces.

> migration scenarios

What we move without losing payouts or skipping proofs.

single marketplace to mix

Move from a single prover marketplace to a portfolio of 2-3: differentiate by circuit type, hedge against single-venue downtime.

cloud GPU to bare-metal

Farm moved off AWS / GCP onto dedicated GPU at Latitude.sh / DataPacket: typical 50% cost-per-proof reduction, no deadline misses.

SP1 to RISC Zero (or back)

Parallel shadow proving for correctness, gradual cutover by circuit type, latency baseline pinned along the way.

scale-up for incentive season

Prover capacity 5x in 7-10 days: H100 burst sourcing, auto-onboarding into marketplaces, post-season wind-down playbook.

FP32 to FP16 / mixed precision

Quantization-aware proof generation where the circuit allows: 30% reduction in time-to-prove, re-verify check per batch.

onto in-house proving stack

From an external prover service onto your own farm: breakeven calc, gradual cutover, audit of key material and precompiles.

> cases

Anonymized. NDAs cover names; the numbers are real.

ZK rollup · 6 mo · validator ops + prover farm · slashing: 0 · missed proof windows: 0

SP1 marketplace operator · 4 mo · 32 H100 · top-3 by win-rate · cost-per-proof -45%

RISC Zero proving service · 8 mo · 16 H100 + 24 4090 · ETA accuracy ±8%

Brevis incentive season · 8 weeks · 5x burst capacity · 99.4% on-time delivery

> SLA tiers

Three coverage levels. For production proving with a signed deadline SLA we recommend Silver or higher.

Tier	Response p95 (Sev-1)	Coverage	Incident report	Engineer hours / mo
Bronze	30 min	Business hours, 5×8	Within 48h	40
Silver	15 min	24/7 on-call rotation	Within 24h	80
Gold	5 min	24/7 with dedicated engineer	Within 12h	160+

> FAQ

Do you work with one zkVM or across all of them?

Across all of them. SP1 and RISC Zero for the RISC-V zkVM approach, Jolt and Halo2 for custom circuits, Boundless / Brevis / Succinct as the marketplace layer. If you have a collab with a specific provider, we plug into their toolchain.

What happens if a deadline is missed?

Sev-1. Escalation in 5-15 min (by tier), root cause in 12-24h, postmortem with concrete action items. Architecturally we try to catch misses early: an alert on ETA >70% of the proof window triggers a pre-emptive migration to a free GPU.

How many GPUs does a starter prover farm need?

Depends on the circuit and target throughput. Baseline for one production circuit: 4-8 H100 or 16-24 RTX 4090. For a marketplace bid: start with 8-16 H100 and scale by win-rate. Send the circuit type and target proofs/hour; we'll send a sized estimate in 24h.

Can you participate in a decentralized prover marketplace on our behalf?

Yes. It's one of our whitespace bets. We register as your operator on Boundless / Brevis / Succinct, run the bidding strategy, watch reputation, reconcile payouts. Staking material and payout wallets stay with you.

Do you support an in-house proving stack for a rollup?

Yes. That includes sequencer + batcher + prover as a coordinated pipeline, L1 finality observability, and a stall playbook. We can build the stack from scratch or take an existing setup into operations.