> solutions / zk

Managed operations for ZK prover farms and rollups

Proof generation is GPU-bound, deadline-bound, parallel. Dropped jobs = missed blocks. We build the farm, the queue, the retries, and per-circuit benchmarks on SP1, RISC Zero, Boundless, Brevis.

We arrive with a working stack: SP1 and RISC Zero as the RISC-V zkVM baseline, Boundless and Brevis as decentralized prover marketplaces, Jolt and Halo2 for custom circuits. For each farm we wire up a GPU scheduler with deadline-aware prioritization, kv-cache for circuit segments, retry queues with exponential backoff, and missed-proof-window alerts.

The whitespace we sit on: the "operator for operators" on prover marketplaces. Succinct, Boundless and Brevis launched as marketplaces in late 2025; no one occupies the "we run your GPU farm on these marketplaces with a signed SLA" position. We take it.

> stack we operate

ZK subset. The platform layer is identical across ICPs.

ZK: SP1 RISC Zero Boundless Brevis Jolt Halo2
Platform: Kubernetes Terraform Ansible Prometheus Grafana Loki OpenTelemetry PagerDuty

> what we deploy

Concrete deliverables for ZK teams. Each ships end-to-end with repo, IaC and runbooks.

[ Prover farm with GPU scheduler ]

H100 / A100 / 4090 pool, deadline-aware job prioritization, NUMA-aware placement, 75%+ per-card utilization in typical setups.

[ Proof-job retry queue ]

Idempotent submit, exponential backoff, dead-letter queue with per-circuit analysis, alert on pre-deadline ETA miss.

[ Per-circuit benchmark catalogue ]

Measurements on SP1 / RISC Zero / Jolt per circuit: time-to-prove, peak memory, optimal GPU type. Artifact cache for precompiles.

[ Operator on Boundless / Brevis / Succinct ]

Marketplace registration, bidding strategy per circuit type, reputation tracking, payout reconciliation, marketplace switching by marginal cost.

[ ZK rollup ops: prover + sequencer + DA ]

Proving stack for your own rollup: coordination across sequencer, batcher and provers, L1 finality monitoring, stall playbook.

> what we operate 24/7

After handoff the pager lives with us. Coverage tuned for ZK farms:

  • Prover liveness against network deadlines: alert if proof ETA exceeds 70% of the proof window.
  • GPU health watchdog: ECC errors, thermal throttling, xid signals trigger preempt and job migration.
  • Queue depth / lag metrics per circuit: alert on >20% drift off baseline.
  • Auto-retry on failed proof generation with deterministic seed for reproducibility.
  • Versioned runbooks: prover crash, OOM on witness generation, stuck marketplace bid, L1 fork-choice inside the proof window.
  • Monthly perf review: new precompiles, refreshed benchmarks, load rebalancing across marketplaces.

> migration scenarios

What we move without losing payouts or skipping proofs.

single marketplace to mix

Move from a single prover marketplace to a portfolio of 2-3: differentiate by circuit type, hedge against single-venue downtime.

cloud GPU to bare-metal

Farm moved off AWS / GCP onto dedicated GPU at Latitude.sh / DataPacket: typical 50% cost-per-proof reduction, no deadline misses.

SP1 to RISC Zero (or back)

Parallel shadow proving for correctness, gradual cutover by circuit type, latency baseline pinned along the way.

scale-up for incentive season

Prover capacity 5x in 7-10 days: H100 burst sourcing, auto-onboarding into marketplaces, post-season wind-down playbook.

FP32 to FP16 / mixed precision

Quantization-aware proof generation where the circuit allows: 30% reduction in time-to-prove, re-verify check per batch.

onto in-house proving stack

From an external prover service onto your own farm: breakeven calc, gradual cutover, audit of key material and precompiles.

> cases

Anonymized. NDAs cover names; the numbers are real.

ZK rollup · 6 mo · validator ops + prover farm · slashing: 0 · missed proof windows: 0
SP1 marketplace operator · 4 mo · 32 H100 · top-3 by win-rate · cost-per-proof -45%
RISC Zero proving service · 8 mo · 16 H100 + 24 4090 · ETA accuracy ±8%
Brevis incentive season · 8 weeks · 5x burst capacity · 99.4% on-time delivery

> SLA tiers

Three coverage levels. For production proving with a signed deadline SLA we recommend Silver or higher.

Tier Response p95 (Sev-1) Coverage Incident report Engineer hours / mo
Bronze 30 min Business hours, 5×8 Within 48h 40
Silver 15 min 24/7 on-call rotation Within 24h 80
Gold 5 min 24/7 with dedicated engineer Within 12h 160+

> FAQ

Across all of them. SP1 and RISC Zero for the RISC-V zkVM approach, Jolt and Halo2 for custom circuits, Boundless / Brevis / Succinct as the marketplace layer. If you have a collab with a specific provider, we plug into their toolchain.

Sev-1. Escalation in 5-15 min (by tier), root cause in 12-24h, postmortem with concrete action items. Architecturally we try to catch misses early: an alert on ETA >70% of the proof window triggers a pre-emptive migration to a free GPU.

Depends on the circuit and target throughput. Baseline for one production circuit: 4-8 H100 or 16-24 RTX 4090. For a marketplace bid: start with 8-16 H100 and scale by win-rate. Send the circuit type and target proofs/hour; we'll send a sized estimate in 24h.

Yes. It's one of our whitespace bets. We register as your operator on Boundless / Brevis / Succinct, run the bidding strategy, watch reputation, reconcile payouts. Staking material and payout wallets stay with you.

Yes. That includes sequencer + batcher + prover as a coordinated pipeline, L1 finality observability, and a stall playbook. We can build the stack from scratch or take an existing setup into operations.

> ready to ship infra?

Tell us about the workload. We reply within 24 hours.