> solutions / web3

Managed DevOps for Web3, validators and rollups

Testnet next quarter. Hiring an SRE who actually knows Cosmos SDK is a 4-month process, plus equity. We get your validators live in 3 regions in 5 days, with slashing alerts and a signed uptime SLA.

We arrive with a working stack: Cosmos SDK, Geth, Reth, OP Stack, Arbitrum Orbit, Polygon CDK, EigenDA, Celestia. For each client protocol we wire up signing observability, distributed locks against double-sign, HSM/KMS workflows, and runbooks for Sev-1 events (slashing triggers, peer drops, fork-choice mismatches).

Whitespace we sit on: post-deploy operations for L2s. Conduit, Caldera and Altlayer own sequencer ramp-up; ops after launch (monitoring, reboots, migrations, hard-fork cutovers) is still an open position. We take it.

> stack we operate

Web3 subset. The platform layer is identical across ICPs.

Web3: Cosmos SDK Geth Reth OP Stack Arbitrum Orbit Polygon CDK EigenDA Celestia
Platform: Kubernetes Terraform Ansible Prometheus Grafana Loki OpenTelemetry PagerDuty

> what we deploy

Concrete deliverables for Web3 teams. Each one ships end-to-end with repo, IaC and runbooks.

[ Validator set across 3 regions ]

Cosmos SDK / Geth / Reth, key isolation on HSM/KMS, distributed lock against double-sign, slashing alerts, missed-block dashboards, failover playbook.

[ RPC front + load-balancer ]

Geth / Reth read-replicas with per-method rate-limits, hot-path caching, p95 latency SLO, geo-routing for global traffic.

[ Sequencer for L2 (OP Stack / Orbit) ]

Sequencer + batcher + proposer as separate processes, L1 finality monitoring, switchover playbook, hot-standby in another region.

[ Incentivized testnet: 100 nodes in 72h ]

Burst delivery for incentives programs: bare-metal sourcing, auto-onboarding, equal-load region spread, leaderboard-position dashboard.

[ DA layer: EigenDA / Celestia ]

Light nodes with signed uptime, retrieval latency tracking, missed-header playbook, sync with the consensus layer.

> what we operate 24/7

After handoff the pager lives with us. Coverage tuned for validators and rollups:

  • Signing observability: any missed block triggers Sev-2, two in a row triggers Sev-1.
  • Auto-failover to a hot-standby in another region on peer drop >30s or disk pressure.
  • On-call escalation: p95 first response 15 min for Sev-1.
  • Slashing-trigger watchdog: if the distributed lock stops responding, the signing key flips to read-only in <500ms.
  • Versioned runbooks per protocol: hard-fork cutover, chain halt, fork-choice mismatch, mempool flood.
  • Monthly ops review: what broke, what we fixed, what we are changing in the SLO.

> migration scenarios

What we move without downtime and without exposing key material.

testnet to mainnet

Validator-set cutover to mainnet with a key ceremony, state sync, checkpointing, and a rollback plan.

cloud to bare-metal

Validators moved off AWS/GCP onto Latitude.sh or OpenMetal: typical 40% reduction in per-node costs with no latency penalty.

hard-fork cutover

Coordinated client upgrade for a known fork height: pre-flight checks, canary node, rolling restart by region.

cross-region sequencer

L2 sequencer moved to another jurisdiction or provider without dropping blocks: hot-standby promote + DNS cutover.

RPC split into geo-clusters

RPC split per-region as traffic grows: anycast / geo-DNS, cache warm-up, per-region rate-limits.

client swap (Geth to Reth)

Parallel sync, per-block checksum verification, smooth switch with no missed slots.

> cases

Anonymized. NDAs cover names; the numbers are real.

ZK rollup · 6 mo · validator ops + RPC · slashing: 0 · uptime: 99.97% over 90d
Cosmos L1 · 12 mo · 7 validators across 4 regions · missed blocks: <0.02% · governance votes: 100%
OP Stack L2 · 4 mo · sequencer + batcher + RPC · 0 missed batches since launch
Incentivized testnet · 8 weeks · 50-node burst · top-5 operator by uptime

> SLA tiers

Three coverage levels. For validators and sequencers we recommend Silver or higher: slashing risk does not tolerate 5x8.

Tier Response p95 (Sev-1) Coverage Incident report Engineer hours / mo
Bronze 30 min Business hours, 5×8 Within 48h 40
Silver 15 min 24/7 on-call rotation Within 24h 80
Gold 5 min 24/7 with dedicated engineer Within 12h 160+

> FAQ

You do. HSM/KMS workflow where keys never leave your control. We sign via a signer daemon with a distributed lock; we don't custody material. Optional MPC setup (CGGMP-21 / FROST) where the protocol supports it.

Architecturally we rule out double-sign through a distributed lock: the signing key flips to read-only if consensus with the other instance can't be reached. Financial responsibility depends on tier: Gold includes a slashing-insurance discussion, Bronze/Silver use a shared-risk model. Across 3 years of ops in the current team: 0 slashing incidents.

Supply window: 72h from signed contract to first live node. Regionally distributed rollout closes within 5-7 days. Send the protocol spec plus target regions; we reply with a concrete window in 24h.

Yes, it's one of our primary stacks. That covers custom modules, IBC relayers, governance voting, and upgrade-handler migrations across major versions. CometBFT, CosmWasm, IBC v2: all in scope.

Yes. Onboarding: 1 week to inventory existing infra, import IaC (or regenerate via Terraform), move keys via a ceremony, take the pager. If anything is critically broken before handoff we fix it first, then sign the SLA.

> ready to ship infra?

Tell us about the workload. We reply within 24 hours.