> services / scale

Infrastructure engineering and migrations for Web3, AI and ZK

Greenfield builds, cloud → bare-metal migrations, regional splits, hard-fork cutovers, burst capacity. We move running systems. Reversibly. Without losing state or signatures.

> what's included

Scale covers any project that reshapes infrastructure: growth, moves, new regions, hard-forks. The scope lands in a migration plan.

Scope	Owned by us	Owned by you
Migration plan: target architecture, dry-runs, rollback	✓ Owned	-
Sourcing new regions and bare-metal partners	✓ Owned	-
Multi-region Kubernetes topology, mesh, DNS cutover	✓ Owned	-
State transfer (chain data, model weights, prover state)	✓ Owned	-
Key ceremony for validator key migration	✓ Coordinated (we direct)	✓ Co-owned (keys stay with you)
Burst scenarios: 100 nodes in 72h	✓ Owned	-
Business decision to migrate / trade-offs	- (we analyse, recommend)	✓ Owned
Billing for old and new infra during transition window	-	✓ Owned

> migration scenarios

Every migration is planned as a reversible process. First a dry-run in an isolated copy of the target environment, then a canary rollout of a subset, then cutover with a rollback that fires within an hour. No "Friday-evening prod jumps".

Cloud → bare-metal

Cost optimisation

Moving GPU workloads or node fleets from public cloud to bare-metal providers. Typical savings: 40-60% on cost-per-token / cost-per-block.

Multi-region split

Latency & resilience

Splitting single-region infra across 3+ regions. Survives regional outages, drops p95 latency for global traffic.

Hard-fork cutover

Chain upgrades

Coordinated validator upgrades for hard-fork blocks. Pre-flight dry-runs, version pinning, rollback playbook in case the majority doesn't fork.

Burst engagement

100 nodes in 72h

Incentivized testnets, fresh network launches, GPU farms for incentive seasons. Sourcing, deployment, monitoring: all in parallel. Aim: top-tier rewards.

Each scenario gets its own runbook, rollback plan and checkpoints. All migrations include a key ceremony for signing systems: HSM/KMS workflow, multi-party approval, audit trail in Git.

> stack we ship with

Web3: Cosmos SDK Geth Reth OP Stack Arbitrum Orbit Polygon CDK EigenDA Celestia

AI / LLM: vLLM Triton TensorRT-LLM NVIDIA H100 / A100 Ray Kubeflow

ZK: SP1 RISC Zero Boundless Brevis Jolt Halo2

DePIN: Filecoin Akash Render io.net Gensyn

Platform: Kubernetes Terraform Ansible Prometheus Grafana Loki OpenTelemetry PagerDuty

> engagement models

Scale projects are usually time-boxed engagements with fixed milestones.

[ FIXED PLAN ] →

Migration plan in 48-72h. Target architecture, milestones, rollback strategy, budget, risks.

For when you need to scope the migration before committing to the full program.

[ RETAINER ] →

Migration program. Full team engagement for 4-12 weeks, milestone-based payment.

The standard model for cloud → bare-metal or multi-region splits.

[ T&M ] →

Burst engagement. For one-off events: hard-fork, 100-node burst, emergency rebalance.

When urgency is known but scope can't be pinned down in advance.

> security baseline for migrations

Migration is the moment of highest risk for key material. What we always enforce on a scale project.

Control	What we ensure
Key ceremony	Multi-party-approved procedure for all validator keys. Audit trail. Keys never leave your HSM/KMS.
HSM / KMS workflow	YubiHSM, AWS KMS, GCP KMS, Vault supported. Signing by process, not by handing over material.
Dry-runs	Every migration runs first in an isolated copy of the target environment.
Rollback playbook	Pre-built rollback plan with <60 min RTO at every cutover step.
Distributed locks	Guarantees single-instance signing per chain throughout the transition window.

> what we'd build for you

Real migrations and scale projects across the four ICPs.

Testnet → mainnet migration with key ceremony and zero-loss state transfer. Coordinated cutover, signing monitoring before and after, rollback playbook in case of chain split.

Spot-fleet to dedicated H100 cluster, cost-per-token cut by 60%. Model-weight migration, cache warm-up, blue-green cutover for production traffic.

Prover capacity ramped 5x for a proving-incentive season. GPU sourcing in <2 weeks, scheduler expansion, proof-throughput monitoring across the network.

Node fleet rebalanced across 12 regions in 72h to hit reward thresholds. Payout-data analysis, regional sourcing, phased rollout, results reconciliation after one week.

> SLA tiers

After the scale phase, infra moves into operate. Three coverage levels.

Tier	Response p95 (Sev-1)	Coverage	Incident report	Engineer hours / mo
Bronze	30 min	Business hours, 5×8	Within 48h	40
Silver	15 min	24/7 on-call rotation	Within 24h	80
Gold	5 min	24/7 with dedicated engineer	Within 12h	160+

> related services

> FAQ

Can a validator really be migrated without downtime?

Yes, with a proper key ceremony and distributed locks. The cutover window is usually one missed epoch, with no double-sign and no slashed stake. Specifics depend on the chain.

How long does a typical migration take?

Cloud → bare-metal for a single application: 4-6 weeks. Multi-region split with DNS cutover: 3-4 weeks. 100-node burst for a testnet: 72h. Hard-fork cutover: a planned window, usually 2-4 weeks of prep.

What if the migration goes wrong?

Every step has a rollback playbook with <60 min RTO. The old stack stays alive until the new one is fully validated. That doubles billing during the transition window, but that's the explicit price of reversibility.

Who runs the key ceremony?

We direct it: we write the procedure, run dry-runs, coordinate participants. Keys stay with you (HSM/KMS); signing happens through your processes. Audit trail in Git.

Do you take scale-only engagements (no deploy/operate)?

Yes. Often the brief is "we already have infra, move it for us". If you want the operate phase after migration, we roll into a retainer. If not, we hand off with runbooks and your team takes it from there.