> process

How we engage with new clients

From the first call to a signed SLA: one path, no grey zones. Discovery, 48h deployment plan, engagement model selection, follow-the-sun on-call, change management, off-boarding.

> Discovery (1 call, 30 min)

One call. No sales funnels. Goal: scope enough to write the 48h deployment plan. What we ask:

  • Workload. Validator, RPC, LLM inference, prover, DePIN fleet, something else.
  • Stack. What's locked in, what's open. Cosmos SDK / Geth / vLLM / SP1 / Filecoin, etc.
  • Regions. Where you need presence, which jurisdictions are excluded.
  • Deadlines. Launch date, hard-fork, incentivized testnet, GPU round IPO.
  • SLA targets. What counts as downtime, acceptable response times, who pays for incidents.
  • Billing and keys. Your cloud account or ours, who holds validator keys.

You always talk to an engineer. Never sales. If the first message already covers the scope, we skip the call and go straight to the plan.

> 48-hour deployment plan

48 hours after Discovery we come back with a one-page plan. Fixed-price. You know cost and timeline before signing a monthly retainer.

What's inside:

  • Architecture. Topology diagram: regions, node types, networking, observability stack.
  • Hardware sourcing. Who the supplier is, supply window, costs for compute / GPU / storage.
  • Milestones. Week 1 / 2 / 4 with concrete deliverables, dates and check-ins.
  • Budget. Itemized: hardware, engineer hours, third-party (PagerDuty, monitoring).
  • Risks and mitigation. What can go wrong, how we roll back.

The plan stands on its own even if you don't continue: you walk away with architecture and a budget estimate on paper. Price depends on scope, typically $X to $XX.

> engagement models

Three contract shapes. Start small, scale up as trust builds.

> SLA tiers

Three coverage levels after handoff into operate. Pick by criticality.

Tier Response p95 (Sev-1) Coverage Incident report Engineer hours / mo
Bronze 30 min Business hours, 5×8 Within 48h 40
Silver 15 min 24/7 on-call rotation Within 24h 80
Gold 5 min 24/7 with dedicated engineer Within 12h 160+

> on-call rotation

We run follow-the-sun. Three time zones, handoffs every 8 hours: a 3 AM incident lands with an engineer at 11 AM, not with someone shaken out of bed.

  • UTC+3 · 00:00 to 08:00 UTC. Europe / Russia / Middle East.
  • UTC+0 · 08:00 to 16:00 UTC. Western Europe / UK / Africa.
  • UTC-5 · 16:00 to 00:00 UTC. Americas (North and South).

Handoffs are written into a shared runbook: what happened on shift, what's open, what to watch. Incidents straddling two shifts get owned by the engineer whose morning it is.

> change management

We come into your systems with "branching first, deploy second". Every change goes through a PR. Zero live `kubectl apply`.

  • Access. Read-only IAM on day one. Write roles with MFA, granted scoped to the contract.
  • Branching. Feature branch per change. No direct commits to main / production.
  • PR flow. CI runs terraform plan / lint / unit tests. Human review mandatory, two approvals for prod.
  • Runbook per change. What we ship, how we roll back, who owns it, what to watch for the first 24h.
  • Audit trail. Every action in git + cloud audit log. Fully visible to your security team.

> incident severity matrix

Four severity levels. Each has its own definition, response target, and comms format. Not "critical/high/medium/low" from a ticketing tool: concrete thresholds.

Severity Definition Response target (Silver) Comms
Sev-1 Full outage or slashing / data-loss risk. Money is burning now. 15 min p95 Call + Slack war room. Updates every 30 min until recovery.
Sev-2 Degradation below SLO (p99 latency, partial region outage). 1 hour Slack incident channel. Updates every 2 hours.
Sev-3 Minor bug, non-blocking alert. Business-hours response is fine. 1 business day Ticket + daily standup.
Sev-4 Cosmetic / change request without urgency. 1 week Backlog / sprint planning.

We write a postmortem within 5 business days of every Sev-1. Blameless, with action items, visible to the client.

> off-boarding

Contract ends, you take infra in-house or hand it to another operator. No "holding". No vendor lock-in. Off-boarding runs on a checklist over 2-4 weeks depending on scope.

  • Access transfer. We drop our IAM roles, revoke keys, rotate secrets. Full audit log.
  • Hand-off doc. Current state, open tickets, watch items, vendor contacts.
  • Runbook walkthrough. One call where we go through every runbook with the receiving team.
  • Shadow period. 1-2 weeks of read-only availability to answer "how does this work" questions.
  • Final audit. Signed document: what was transferred, what (if anything) stays with us.

Everything we built sits in your git repos and your cloud accounts from day one. Off-boarding is mostly access cleanup and knowledge transfer.

> ready to start with Discovery?

Tell us about the workload. We reply within 24 hours.