Those who work for an MSP company might have heard that GitLab recently endured a massive database data loss. GitLab went through a long and strenuous recovery as its backup and recovery methods were untested and subsequently failed. Though the company should have been better prepared, its representatives should be commended for being honest about their failure to test database backups. It is time to take a step back and review backup and recovery strategies. This vitally important component of application services should be regularly tested.
A Look at Backup Defaults
Engine yard databases receive physical and logical backups once per day. Exactly 10 copies of every backup variety are maintained. The backups stay available upwards of 90 days after the Environment is shut down. The backups are removed in a day’s time in the event the environment is deleted. A cloud dashboard is used to schedule such backups.
A snapshot is used when a new replica database is being built or when terminating and rebuilding each instance within an environment. Snapshots are valuable as they are extremely fast. A logical backup is essential to data recovery/transfer. Though they are somewhat slow to generate and restore, they provide ample flexibility in data restoration. They download, copy, alter and transfer logical backups across environments with ease.
Beyond a Replica
A replica is certainly helpful to facilitate a timely recovery. However, a replica’s data can be damaged by the same phenomenon that harmed the master. If the replica replicates damaging statements, the sole option will likely be restoring to a restoration of the backup.
A replica provides access to one point in time of a database. This point gradually moves forward in time. It is within seconds or even milliseconds of the master’s data. This puts a limit on its value in the context of data recovery. It can’t provide access to the database a week, a day, or an hour ago. The problem is data loss isn’t noticed right away. Time will go by so data recovery from the database state at a previous point in time will be necessary.
The best backup tools are built with unit testing as well as integration. These tools should be put through hands-on testing prior to the publishing of additional updates. Furthermore, each replica should have a Snapshot backup along with a logical backup. Backup restore tests should be performed on a weekly or monthly basis. If there are meaningful alterations in data size or the structure of a database, the backup restore test should be performed.
When conducting tests, the data quality should be of particular concern. The data must be correct and comprehensive. Pay attention to backup history. Be mindful of the number of days of backups you have at the moment and whether it suits your organization’s unique requirements. Recovery time objective is also quite important. Determine if the backup retrieval and restoration are completed in an acceptable amount of time.
The recovery point objective matters a great deal. Take a moment to consider the most amount of information your organization could lose if a loss is endured prior to the next backups. If it does not adhere to the requirements of your MSP company, then changes are necessary.
When it comes to validating snapshots, the best approach is a cloning of the Production environment. Such a process makes use of recent snapshots on the primary hosts. New environments are built with such snapshots as the data source. This environment can be used a test environment. You can also conduct checks to ensure the application contains the expected information and responds in the desired manner.
When validating a database, one of the best routes is to generate a replica. Restoring a backup stemming from production locally or in a Testing environment is ideal for logical backups. Restoring such backups might prove to be a key component of the development lifecycle.
Learn from the mistakes of others and engage in database testing. The last thing your MSP company needs is a devastating data loss. Ensure your backup and recovery methods are effective and you won’t have to deal with the fallout of such a nightmare scenario.