Disaster recovery topology with 3 French datacenters

"What happens when your datacenter burns down?" is a question that until OVH's Strasbourg fire in March 2021 was largely theoretical. Now it's a procurement checklist item. With three French datacenters — Paris, Marseille, Lyon — FranceVPS lets you build genuine geographic redundancy without leaving French jurisdiction. This post covers the topologies that work and the ones that don't.

What "disaster" actually means

"Disaster" in DR planning isn't just fire (though fire is the canonical example). It includes:

Site-wide power outage — generators fail, datacenter goes dark
Network isolation — fiber cuts, BGP issues, regional internet outages
Hardware-correlated failures — bad batch of drives, firmware bug rolling out across a fleet
Human error — accidentally rm -rf the production database, or a config push that takes down a region
Security incidents — ransomware, attacker with admin credentials
Physical destruction — fire, flood, severe weather, civil unrest

Different topologies handle different scenarios. A topology that survives fire might not survive ransomware. A topology that survives ransomware might cost 4× a topology that just survives fire.

The four topology tiers

Increasing protection, increasing cost:

Tier 1: Single datacenter with backups

Production runs in one datacenter (say, Paris FR-PAR-1). Backups go to FR-PAR-2 (different building, same region) and a third-party (Hetzner Storage Box, Backblaze, etc.). Recovery from disaster requires restoring backups to a fresh VPS.

Survives: hardware failure, single-VPS issues, accidental deletion (if backup retention is sufficient), ransomware (if backups are immutable)

Doesn't survive: immediate availability loss during a Paris-wide incident — your service is down for hours while you restore.

Cost: 1× production + cheap backup storage. Suitable for non-critical workloads.

Tier 2: Active-passive across two cities

Production in Paris, warm standby in Marseille. Database replicated continuously, application servers ready but not handling traffic. Failover requires DNS change (or Anycast IP swap) and database promotion. RTO (Recovery Time Objective): 5-15 minutes.

Survives: Paris-wide incident (you cut over to Marseille and keep running)

Doesn't survive: two simultaneous incidents (very unlikely), correlated failures across both sites (e.g., a software bug deployed to both)

Cost: roughly 1.6× single-site (the standby is sized smaller during normal operation).

Tier 3: Active-active across two cities

Both sites serving traffic continuously, load-balanced via DNS or Anycast. Requires the application to be stateless or for state to be replicated cross-site (which is the hard part). RTO is effectively zero — if Paris dies, Marseille just keeps serving.

Survives: single-site incidents with no perceived downtime

Doesn't survive: data conflicts during failover (split-brain scenarios in databases), correlated failures

Cost: 2× production + significant engineering investment in cross-site replication and conflict resolution.

Tier 4: Active-active across three cities

The full DR mode: Paris, Marseille, and Lyon all running, with quorum-based decisions. Survives any single-site loss without downtime; survives most two-site losses. Used by financial services, telcos, public utilities.

Survives: almost everything except correlated software bugs (which is why discipline around staged rollouts matters)

Cost: 3× production + extensive engineering work. Justifiable for critical infrastructure, overkill for most SaaS.

What works on FranceVPS specifically

Our three datacenters are well-suited to DR topologies:

Paris (FR-PAR-1) ↔ Marseille (FR-MRS-1): 750 km apart, separate power grids, independent network paths via different submarine and terrestrial routes. ~8ms latency between sites — fine for synchronous replication of small writes, asynchronous for everything else.
Paris (FR-PAR-1) ↔ Lyon (FR-LYO-1): 465 km apart, separate power grids. ~4ms latency — supports synchronous replication for most database write loads.
Three-site quorum: Paris + Lyon + Marseille gives true triangle redundancy with reasonable inter-site latency.

Database replication strategies

The database is usually the hardest piece of DR. Three approaches:

Asynchronous streaming replication (PostgreSQL pg_basebackup + recovery.conf): simple, well-tested, but you can lose seconds-to-minutes of writes during failover. Suitable for content sites where occasional data loss is acceptable.

Synchronous replication: commit blocks until both sites confirm. Zero data loss on failover, but writes are limited by inter-site latency. Paris-Lyon at 4ms means your minimum write latency is 4ms — fine for most apps, painful for high-frequency write loads.

Logical replication / multi-master (BDR, Citus, etc.): multiple sites can accept writes. Conflict resolution becomes your problem. Use only when you understand the tradeoffs deeply.

Application layer considerations

Database aside, your application needs:

Stateless application servers (or at most session state in a shared cache like Redis)
External configuration (Consul, Vault) so all sites read the same config
Health checks that distinguish "this server is broken" from "this datacenter is broken"
DNS with low TTLs (60-120s) so failover happens quickly
Object storage replication for user uploads, generated assets, etc.

Testing DR (this is where most plans fail)

An untested DR plan is theater. Tests should be:

Quarterly, minimum
Real, not just on paper — actually fail over to the standby site, run on it for an hour, fail back
Documented, with the runbook updated based on what you learn

The first time you fail over in production should not be during an actual incident. The first DNS TTL miscalculation, the first config drift, the first forgotten dependency on the primary site — those should surface in testing, not at 2 AM during a real outage.

The economics

For most SaaS companies, Tier 2 (active-passive across two cities) is the right answer. It costs roughly 1.6× single-site, gives you a meaningful RTO, and is achievable with off-the-shelf tooling. Tier 3 active-active is justified when downtime costs more than the additional infrastructure (which is rarer than people think).

Tier 1 (backups only) is appropriate for non-customer-facing workloads, internal tools, and anything where 4-8 hours of downtime to restore from backup is acceptable.

The mistake we see most often: companies pay for Tier 2 or Tier 3 capacity but never test failover, so they have the cost without the protection. Cheaper to run Tier 1 properly than Tier 3 incorrectly.

DRArchitectureDisaster RecoveryDatacenters