> process

How we engage with new clients

From the first call to a signed SLA: one path, no grey zones. Discovery, 48h deployment plan, engagement model selection, follow-the-sun on-call, change management, off-boarding.

> Discovery (1 call, 30 min)

One call. No sales funnels. Goal: scope enough to write the 48h deployment plan. What we ask:

Workload. Validator, RPC, LLM inference, prover, DePIN fleet, something else.
Stack. What's locked in, what's open. Cosmos SDK / Geth / vLLM / SP1 / Filecoin, etc.
Regions. Where you need presence, which jurisdictions are excluded.
Deadlines. Launch date, hard-fork, incentivized testnet, GPU round IPO.
SLA targets. What counts as downtime, acceptable response times, who pays for incidents.
Billing and keys. Your cloud account or ours, who holds validator keys.

You always talk to an engineer. Never sales. If the first message already covers the scope, we skip the call and go straight to the plan.

> 48-hour deployment plan

48 hours after Discovery we come back with a one-page plan. Fixed-price. You know cost and timeline before signing a monthly retainer.

What's inside:

Architecture. Topology diagram: regions, node types, networking, observability stack.
Hardware sourcing. Who the supplier is, supply window, costs for compute / GPU / storage.
Milestones. Week 1 / 2 / 4 with concrete deliverables, dates and check-ins.
Budget. Itemized: hardware, engineer hours, third-party (PagerDuty, monitoring).
Risks and mitigation. What can go wrong, how we roll back.

The plan stands on its own even if you don't continue: you walk away with architecture and a budget estimate on paper. Price depends on scope, typically $X to $XX.

> engagement models

Three contract shapes. Start small, scale up as trust builds.

[ FIXED PLAN ] →

48h deployment plan. Architecture, sourcing, milestones, budget: one page. Fixed-price.

Ideal way to scope and price an engagement before signing a retainer.

[ RETAINER ] →

Monthly engagement. Full deploy + ongoing operate. On-call, patching, releases: one contract.

Best when infra is in production and you need a stable team behind it.

[ T&M ] →

Time & materials. For burst scenarios: incentivized testnets, hard-forks, migrations, emergency hire-outs.

When the volume is unknown but the urgency isn't.

> SLA tiers

Three coverage levels after handoff into operate. Pick by criticality.

Tier	Response p95 (Sev-1)	Coverage	Incident report	Engineer hours / mo
Bronze	30 min	Business hours, 5×8	Within 48h	40
Silver	15 min	24/7 on-call rotation	Within 24h	80
Gold	5 min	24/7 with dedicated engineer	Within 12h	160+

> on-call rotation

We run follow-the-sun. Three time zones, handoffs every 8 hours: a 3 AM incident lands with an engineer at 11 AM, not with someone shaken out of bed.

UTC+3 · 00:00 to 08:00 UTC. Europe / Russia / Middle East.
UTC+0 · 08:00 to 16:00 UTC. Western Europe / UK / Africa.
UTC-5 · 16:00 to 00:00 UTC. Americas (North and South).

Handoffs are written into a shared runbook: what happened on shift, what's open, what to watch. Incidents straddling two shifts get owned by the engineer whose morning it is.

> change management

We come into your systems with "branching first, deploy second". Every change goes through a PR. Zero live `kubectl apply`.

Access. Read-only IAM on day one. Write roles with MFA, granted scoped to the contract.
Branching. Feature branch per change. No direct commits to main / production.
PR flow. CI runs terraform plan / lint / unit tests. Human review mandatory, two approvals for prod.
Runbook per change. What we ship, how we roll back, who owns it, what to watch for the first 24h.
Audit trail. Every action in git + cloud audit log. Fully visible to your security team.

> incident severity matrix

Four severity levels. Each has its own definition, response target, and comms format. Not "critical/high/medium/low" from a ticketing tool: concrete thresholds.

Severity	Definition	Response target (Silver)	Comms
Sev-1	Full outage or slashing / data-loss risk. Money is burning now.	15 min p95	Call + Slack war room. Updates every 30 min until recovery.
Sev-2	Degradation below SLO (p99 latency, partial region outage).	1 hour	Slack incident channel. Updates every 2 hours.
Sev-3	Minor bug, non-blocking alert. Business-hours response is fine.	1 business day	Ticket + daily standup.
Sev-4	Cosmetic / change request without urgency.	1 week	Backlog / sprint planning.

We write a postmortem within 5 business days of every Sev-1. Blameless, with action items, visible to the client.

> off-boarding

Contract ends, you take infra in-house or hand it to another operator. No "holding". No vendor lock-in. Off-boarding runs on a checklist over 2-4 weeks depending on scope.

Access transfer. We drop our IAM roles, revoke keys, rotate secrets. Full audit log.
Hand-off doc. Current state, open tickets, watch items, vendor contacts.
Runbook walkthrough. One call where we go through every runbook with the receiving team.
Shadow period. 1-2 weeks of read-only availability to answer "how does this work" questions.
Final audit. Signed document: what was transferred, what (if anything) stays with us.

Everything we built sits in your git repos and your cloud accounts from day one. Off-boarding is mostly access cleanup and knowledge transfer.