Resilience & Redundancy
The database uses AWS Relational Database Service (“RDS”) Multi-Availability Zone (“Multi-AZ”) to provide hot standby with automatic failover.
The application and event servers are arranged into clusters that provide resilience and redundancy in case of server failure and traffic changes.
All data stores are snapshotted nightly and those encrypted snapshots are stored for 30 days. Each data store also has a point-in-time recovery (PTR) that allows us to rebuild the data to within 1 minute (although possibly up to secs) of the point of failure.
We use Chef to automate the management of our servers allowing us re-establish ProdPad in a new AWS Region should EU Ireland be rendered unavailable for an extended period within 4 hours of recovery started.
Full details are available in our Service Level Agreement, which can be found here.