by XIMTRX team

Multi-region in 72 hours: our burst-deploy runbook

A protocol announces an incentivized testnet and wants nodes in several regions within 72 hours. You cannot ship metal that fast. We cover the runbook we use to stand up a multi-region burst and tear it down without a trace.

#deploy #multi-region #burst #scale #automation

The scenario repeats every quarter: a protocol announces an incentivized testnet or an urgent campaign and wants nodes in several regions within 72 hours. You cannot order metal on that timeline, the lead time is longer than the testnet itself. We stand up these bursts for clients on a drilled runbook, and the whole point is that the 72 hours go into execution of something prepared in advance, not into "figuring out how." Below is that runbook.

Why 72 hours is not about hardware

In 72 hours you cannot ship and rack metal across several countries, and you should not try. A burst lives in cloud by definition, because only cloud brings up capacity in a new region in minutes. The question is not "where do we get servers" but "how fast do we turn an empty account into a working distributed fleet." And that is decided by what is prepared before the start, not by what we can write in three days.

What is prepared in advance

The 72 hours are won months earlier. By start time we already have:

  • Infrastructure as code. Terraform modules for regions, networks, instances. Bringing up a region is applying a module with different parameters, not assembling it by hand under stress.
  • Ready node images. The node config, monitoring agents, and key placeholders are baked into the image ahead of time. A new machine arrives almost ready rather than being configured after boot.
  • Region-selection templates. Which providers live in which regions, where we already have accounts with raised limits, where transit is fast. This is a table, not research on the fly.
  • Monitoring out of the box. New nodes register themselves in monitoring and start sending metrics, with no manual add per node.

The preparation is the product. A team writing Terraform from scratch at start time loses the window no matter how fast it moves.

The hidden trap: quotas, not capacity

The most common burst failure is not a shortage of servers but account limits. A new or lightly used cloud account arrives with low quotas on instance types and regions, and a request to raise them is not instant, sometimes hours or a day. In an urgent testnet, a day waiting on a quota approval eats a third of the window, and you find out exactly when it is too late.

So in our runbook, checking and pre-ordering quotas is the first step, before any deploy: limits for the needed regions and card/instance types are raised in advance, with a clear head. This is the trap that nearly sinks the first day for teams doing a burst for the first time.

Rollout without single-region risk

With the preparation in place and quotas raised, the rollout itself is execution. Nodes come up across several regions and several providers at once, not in one: an incentivized testnet often scores geographic diversity, and piling the whole burst into one region is both a scoring risk and a single blast radius for one failure. We cut over to live work after nodes pass the first network check, not on "the instance booted."

Teardown: a burst must be able to disappear

Half the economics of a burst is the ability to switch off. When the campaign ends, the fleet is torn down by code the same way it came up: instances removed, taken off monitoring, account resources closed so no bill drips for a forgotten machine in a distant region. A burst that can disappear cleanly costs exactly as long as it ran; a burst that leaves tails leaks money for months.

What it looks like for a client

On the polygon we run this runbook dry: we bring up a multi-region fleet, check quotas, cut over, and tear down, timing how long each step actually takes. On a client contract this becomes a concrete plan for their protocol: which regions, which providers, where to pre-order limits, what criterion triggers cutover.

If you need to stand up multi-region capacity for an urgent window and tear it down cleanly, that is what we cover through deploy and scale. Want to prepare a runbook for your next testnet in advance: get in touch.

← All posts