Turn your recovery runbook into a system that executes.
Your business continuity and disaster recovery plans hold the playbook for your worst day, yet they sit frozen in documents while your real dependencies drift every hour. TwinGraph maps your recovery plan into a live graph where every dependency and failover rule is running code, so the moment a region degrades your plan traces the blast radius, fails over, and tells your responders what it did.
A disaster recovery plan is only as good as its assumptions, and those assumptions go stale the moment you write them down. Your application topology changes every hour while your runbooks, CMDB exports, and dependency diagrams describe a system that no longer exists. During an outage your telemetry sits in one tool, your asset relationships sit in another, and your recovery steps sit in a PDF, so responders burn precious minutes correlating them by hand while the cost compounds. Worse, no static document can trace how a region failure cascades into the services, queues, and downstream operations that depend on it, so the blast radius gets discovered live, in production, at the worst possible time.
TwinGraph replaces the static runbook with a live operational graph where your infrastructure, its dependencies, and your failover logic are first-class, running objects. Health telemetry streams in over MQTT or Pub/Sub and mutates the graph in real time, so a node reflects what is true right now, not what last night's snapshot said. When a dependency degrades, the graph walks its own relationships to trace the blast radius instantly, and the failover rules attached to those nodes fire on their own, invoking a serverless function to reroute DNS, drain a region, or promote a replica. The graph runs on the TwinGraph Server, which your responders, incident agents, and the TwinGraph Browser reach over a secure gRPC channel, so the live state stays queryable from anywhere even when your primary region is the part that is down. Live Graph RAG grounds your incident response agents in that same live state, so an AI responder reasons over what changed seconds ago instead of a cached topology.
From question to action, in seconds.
“Our primary region just went amber. What does it take down, and what is already failing over?”
- 01 · Trace
TwinGraph walks the live dependency graph out from the degraded region and surfaces every service, queue, and downstream job that depends on it, ranked by impact.
- 02 · Decide
The failover rules attached to those nodes evaluate their conditions against health that arrived seconds ago, not last night's snapshot, and pick the safe recovery path.
- 03 · Act
Without waiting for a human, the graph invokes a serverless function to reroute DNS, drain the region, and promote a replica, recording each action as it fires.
- 04 · Audit
Every read, decision, and write is grounded in the live graph and logged, so responders and auditors can replay exactly what happened and why.
A few steps. Real infrastructure.
Map your dependencies
Bring runbooks, CMDB, and topology diagrams into the graph as live nodes and the real relationships between them.
Bind to live health
Stream region, service, and infrastructure telemetry over MQTT or Pub/Sub so every node tracks its own status in real time.
Attach failover logic
Each recovery rule runs as a live node that evaluates conditions and triggers the next action, instead of a step someone has to read under pressure.
Recover automatically
When a node goes critical the graph traces the blast radius, fires the failover, and notifies your responders and incident agents with what it did.
- 01
Ground truth, not stale assumptions: Dependencies and health live in the same graph and update as your systems change, so your recovery plan reflects production as it is right now, not as it was at the last audit.
- 02
Blast radius in milliseconds: Because every dependency is a first-class relationship, the graph walks its own topology to show exactly what a failure takes down, before a responder has to guess.
- 03
Failover that executes itself: Recovery steps run as code attached to the nodes they protect, so the instant a signal crosses the line the graph reroutes, drains, or promotes on its own and keeps your responders in the loop.