Within a Clearwater deployment, Ellis and Vellum store persistent data (Bono, Sprout, Homer and Dime do not). To prevent data loss in disaster scenarios, Ellis and Vellum have data backup and restore mechanisms. Specifically, they support
- manual backup
- periodic automated local backup
- manual restore.
This document describes
- how to list the backups that have been taken
- how to take a manual backup
- the periodic automated local backup behavior
- how to restore from a backup.
Note that Vellum has 4 databases: *
homestead_cache for Homestead’s data *
homer for Homer’s data
memento for Memento’s data (if using the Memento AS)
Depending on your deployment scenario, you may not need to back up all
of the data of Ellis and Vellum: * If your Clearwater deployment is
integrated with an external HSS, the
HSS is the master of Ellis’ and some of Vellum’s data, so you only need
to backup/restore data in the
memento databases on
Vellum * If you are not using a Memento AS, you do not need to
memento database on Vellum
The process for listing backups differs between Ellis and Vellum.
To list the backups that have been taken on Ellis, run
Backups for ellis: 1372294741 /usr/share/clearwater/ellis/backup/backups/1372294741 1372294681 /usr/share/clearwater/ellis/backup/backups/1372294681 1372294621 /usr/share/clearwater/ellis/backup/backups/1372294621 1372294561 /usr/share/clearwater/ellis/backup/backups/1372294561
To list the backups that have been taken on Vellum, run
sudo /usr/share/clearwater/bin/list_backups.sh homestead_provisioning
sudo /usr/share/clearwater/bin/list_backups.sh homestead_cache
sudo /usr/share/clearwater/bin/list_backups.sh homer
sudo /usr/share/clearwater/bin/list_backups.sh memento
This produces output of the following form, listing each of the available backups.
No backup directory specified, defaulting to /usr/share/clearwater/homestead/backup/backups provisioning1372812963174 provisioning1372813022822 provisioning1372813082506 provisioning1372813143119
You can also specify a directory to search in for backups, e.g. for
sudo /usr/share/clearwater/bin/list_backups.sh homestead_provisioning <backup dir>
Taking a Manual Backup¶
The process for taking a manual backup differs between Ellis and Vellum. Note that in both cases,
- the backup is stored locally and should be copied to a secure backup server to ensure resilience
- this process only backs up a single local node, so the same process must be run on all nodes in a cluster to ensure a complete set of backups
- these processes cause a small amount of extra load on the disk, so it is recommended not to perform this during periods of high load
- only 4 backups are stored locally - when a fifth backup is taken, the oldest is deleted.
To take a manual backup on Ellis, run
This produces output of the following form, reporting the successfully-created backup.
Creating backup in /usr/share/clearwater/ellis/backup/backups/1372336317/db_backup.sql
Make a note of the snapshot directory (
1372336317 in the example
above) - this will be referred to as
This file is only accessible by the root user. To copy it to the current user’s home directory, run
snapshot=<snapshot> sudo bash -c 'cp /usr/share/clearwater/ellis/backup/backups/'$snapshot'/db_backup.sql ~'$USER' && chown '$USER.$USER' db_backup.sql'
This file can, and should, be copied off the Ellis node to a secure backup server.
To take a manual backup on Vellum, run
sudo cw-run_in_signaling_namespace /usr/share/clearwater/bin/do_backup.sh homestead_provisioning
sudo cw-run_in_signaling_namespace /usr/share/clearwater/bin/do_backup.sh homestead_cache
sudo cw-run_in_signaling_namespace /usr/share/clearwater/bin/do_backup.sh homer
sudo cw-run_in_signaling_namespace /usr/share/clearwater/bin/do_backup.sh memento
These each produce output of the following form, reporting the successfully-created backup.
... Deleting old backup: /usr/share/clearwater/homestead/backup/backups/1372812963174 Creating backup for keyspace homestead_provisoning... Requested snapshot for: homestead_provisioning Snapshot directory: 1372850637124 Backups can be found at: /usr/share/clearwater/homestead/backup/backups/provisioning/
Note that Each of the Vellum databases will produce a different snapshot in a different directory.
The backups are only stored locally - the resulting backup for each
command is stored in the listed directory. Make a note of the snapshot
directory for each database - these will be referred to as
These should be copied off the node to a secure backup server. For
example, from a remote location execute
scp -r ubuntu@<homestead node>:/usr/share/clearwater/homestead/backup/backups/provisioning/<snapshot> ..
Periodic Automated Local Backups¶
Ellis and Vellum are automatically configured to take daily backups if you’ve installed them through chef, at midnight local time every night.
If you want to turn this on, edit your crontab by running
sudo crontab -e and add the following lines if not already present:
- On Ellis:
0 0 * * * /usr/share/clearwater/ellis/backup/do_backup.sh
- On Vellum:
0 0 * * * /usr/bin/cw-run_in_signaling_namespace /usr/share/clearwater/bin/do_backup.sh homestead_provisioning
5 0 * * * /usr/bin/cw-run_in_signaling_namespace /usr/share/clearwater/bin/do_backup.sh homestead_cache
10 0 * * * /usr/bin/cw-run_in_signaling_namespace /usr/share/clearwater/bin/do_backup.sh homer
15 0 * * * /usr/bin/cw-run_in_signaling_namespace /usr/share/clearwater/bin/do_backup.sh memento
These backups are stored locally, in the same locations as they would be generated for a manual backup.
Restoring from a Backup¶
There are three stages to restoring from a backup.
- Copying the backup files to the correct location.
- Running the restore backup script.
- Synchronizing Ellis’ and Vellum’s views of the system state.
This process will impact service and overwrite data in your database.
Copying Backup Files¶
The first step in restoring from a backup is getting the backup files/directories into the correct locations on the Ellis or Vellum node.
If you are restoring from a backup that was taken on the node on which you are restoring (and haven’t moved it), you can just move onto the next step.
If not, create a directory on your system that you want to put your
backups into (we’ll use
~/backup in this example). Then copy the
backups there. For example, from a remote location that contains your
scp -r <snapshot> ubuntu@<vellum node>:backup/<snapshot>.
On Ellis, run the following commands.
snapshot=<snapshot> sudo chown root.root db_backup.sql sudo mkdir -p /usr/share/clearwater/ellis/backup/backups/$snapshot sudo mv ~/backup/$snapshot/db_backup.sql /usr/share/clearwater/ellis/backup/backups/$snapshot
On Vellum there is no need to further move the files as the backup script takes a optional backup directory parameter.
If you are restoring a Vellum backup onto a completely clean deployment, you must ensure that the new deployment has at least as many Vellum nodes as the one from which the backup was taken. Each backup should be restored onto only one node, and each node should have only one backup restored onto it. If your new deployment does not have enough Vellum nodes, you should add more nodes and then, once restoring backups is complete, scale down your deployment to the desired size.
Running the Restore Backup Script¶
To restore a backup on Ellis, run the following command *
sudo /usr/share/clearwater/ellis/backup/restore_backup.sh <snapshot>
Ellis will produce output of the following form.
Will attempt to backup from backup 1372336317 Found backup directory 1372336317 Restoring backup for ellis... -------------- /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */ -------------- ... -------------- /*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */ --------------
To restore a backup on Vellum, you must perform the following steps. Note that this stops the Cassandra processes on every Vellum node, so Cassandra will be unavailable for the duration.
On every Vellum node in turn, across all GR sites, stop Cassandra using the following command:
sudo monit stop -g cassandra
On every Vellum node on which you want to restore a backup, run the restore backup script for each keyspace using the following commands:
sudo /usr/share/clearwater/bin/restore_backup.sh homestead_provisioning <hs-prov-snapshot> <backup directory>
sudo /usr/share/clearwater/bin/restore_backup.sh homestead_cache <hs-cache-snapshot> <backup directory>
sudo /usr/share/clearwater/bin/restore_backup.sh homer <homer-snapshot> <backup directory>
sudo /usr/share/clearwater/bin/restore_backup.sh memento <memento-snapshot> <backup directory>
Note that, because the 4 Vellum databases are saved to different backups, the name of the snapshot used to restore each of the databases will be different.
Vellum will produce output of the following form.
Will attempt to backup from backup 1372336442947 Will attempt to backup from directory /home/ubuntu/bkp_test/ Found backup directory /home/ubuntu/bkp_test//1372336442947 Restoring backup for keyspace homestead_provisioning... xss = -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xm s826M -Xmx826M -Xmn100M -XX:+HeapDumpOnOutOfMemoryError -Xss180k Clearing commitlog... filter_criteria: Deleting old .db files... filter_criteria: Restoring from backup: 1372336442947 private_ids: Deleting old .db files... private_ids: Restoring from backup: 1372336442947 public_ids: Deleting old .db files... public_ids: Restoring from backup: 1372336442947 sip_digests: Deleting old .db files... sip_digests: Restoring from backup: 1372336442947
On every Vellum node in turn, across all GR sites, restart Cassandra using the following command:
sudo monit monitor -g cassandra
On every Vellum node in turn, across all GR sites:
- Wait until the Cassandra process has restarted by running
sudo monit summaryand verifying that the
cassandra_processis marked as
sudo cw-run_in_signaling_namespace nodetool repair -par
- Wait until the Cassandra process has restarted by running
At this point, the backups have been restored.
It is possible (and likely) that when backups are taken on different boxes the data will be out of sync, e.g. Ellis will know about a subscriber, but there will no digest in Vellum. To restore the system to a consistent state we have a synchronization tool within Ellis, which can be run over a deployment to get the databases in sync. To run, log into an Ellis box and execute:
cd /usr/share/clearwater/ellis sudo env/bin/python src/metaswitch/ellis/tools/sync_databases.py
- Run through all the lines on Ellis that have an owner and verify that there is a private identity associated with the public identity stored in Ellis. If successful, it will verify that a digest exists in Vellum for that private identity. If either of these checks fail, the line is considered lost and is removed from Ellis. If both checks pass, it will check that there is a valid iFC - if this is missing, it will be replaced with the default iFC.
- Run through all the lines on Ellis without an owner and make sure there is no orphaned data in Vellum, i.e. deleting the simservs, iFC and digest for those lines.