LucidDbBackupRestore
Contents |
Overview
This page explains how to backup and restore an instance of LucidDB.
Before reading the procedures, here are a few important points to note:
- Starting from release 0.8.0, LucidDB supports hot backup. This means that you do not have to shut down the server before backing up. You can run queries, updates, and even DDL while the backup is in progress. An older, cold backup procedure required a server shutdown.
- LucidDB currently only supports physical backup. This means copying binary data files rather than exporting data and metadata in a platform-neutral format (e.g. DDL scripts and INSERT statements). For logical data-level export, see the EXPORT_SCHEMA_TO_FILE system procedure. Metadata export in DDL form is not currently available. (A lot of the underlying components needed are available, so putting them together into a system procedure would be a great contribution.)
- Binary data files are architecture specific, so physical backup/restore is only possible from installations on the same architecture (Linux 64 bit to Linux 64 bit will work but Linux 32 bit to Linux 64 bit will NOT work). If you need to move between architectures, use the logical data level export outlined above.
- Reliable backup/restore procedures are defined here, but no automation scripts are available in packaged builds yet (another potential for a good contribution). Developer source builds include ant targets backupCatalog and restoreCatalog which carry out these procedures (skipping the verification steps).
- LucidDB hot backups will avoid backing up free data pages. Using the LucidDB preallocation utility, it's possible to pre-allocate space in your db.dat file. Regardless of the size of the db.dat file, the backup will bypass all free pages. Therefore, it generally is a good idea to run
ALTER SYSTEM DEALLOCATE OLDbefore doing a backup to avoid backing up pages that are no longer in use, but haven't yet been freed.
Backup
To create a backup, two system procedures are provided:
Creating a backup is simply a matter of calling one of these two procedures.
Both procedures take as a parameter the pathname of a directory where the backups will be written. The directories must be writable. They do not need to be empty, but they cannot contain files with the following reserved names, as these are the files that will be written out by the backup.
- backup.properties - text file containing a description of the backup
- FarragoCatalogDump (or FarragoCatalogDump.gz if using compression) - an XMI dump of the system catalogs
- FennelDataDump.dat (or FennelDataDump.dat.gz if using compression) - a copy of the data pages in use at the start of the backup and system-level metadata pages
The second procedure shown above will estimate if there is sufficient space to create the backup before proceeding with backup of the data pages. However, because LucidDB can only estimate the space requirements, there's still a possibility that the backup will fail due to lack of space.
Only one backup can be running at any given time.
A backup cannot be issued if the session has a label setting.
Any failures that occur during a backup will not clean up partially written files in the archive directory.
Types of Backup
In addition to supporting full backups, which back up all data pages, LucidDB also supports incremental and differential backups.
Backup terminology copied from wikipedia:
- Incremental backup: A Full + Incremental repository aims to make storing several copies of the source data more feasible. At first, a full backup (of all files) is taken. After that an incremental backup (of only the files that have changed since the previous full or incremental backup) can be taken. Restoring whole systems to a certain point in time would require locating the full backup taken previous to that time and all the incremental backups taken between that full backup and the particular point in time to which the system is supposed to be restored. This model offers a high level of security that something can be restored and can be used with removable media such as tapes and optical disks. The downside is dealing with a long series of incrementals and the high storage requirements.
- Differential backup: A full + differential backup differs from a full + incremental in that after the full backup is taken, each partial backup captures all files created or changed since the full backup, even though some may have been included in a previous partial backup. Its advantage is that a restore involves recovering only the last full backup and then overlaying it with the last differential backup.
An error will be returned if you attempt to create either a differential or incremental backup without first having created some other backup.
Compression
The files containing the catalog dump and data pages can be optionally compressed using gzip. The backup procedures will try to locate the gzip executable by first searching /bin, then /usr/bin. If the executable is in neither of those directories, then the the executable must be in the user's PATH environment variable.
Note that compression is currently only supported on Linux.
Concurrency
As noted earlier, backups can run concurrently with other SQL statements. However, ALTER SYSTEM DEALLOCATE OLD is a no-op if it's run during a backup. This ensures that pages that need to be backed up aren't freed in the middle of the backup.
During the initial phase of the backup while creating the catalog dump and backing up system-level metadata, locks are placed on the system catalog, which will cause other SQL statements to wait. This is a short duration lock, so the concurrency impact is minimized. During the lengthier portion of the backup when data pages are being backed up, no locks are held. It is possible to cancel the backup.
See LucidDbConcurrencyControl for background on concurrency control in LucidDB.
Restore
Restore is also invoked via SQL using the system procedure, RESTORE_DATABASE. The procedure takes as a parameter the pathname of the archive directory written by an earlier backup. The directory must exist and be readable to LucidDB.
There are two variations of the restore UDR. One restores both data pages and catalog data, while the other only restores data pages. If a sequence of backups are required to restore LucidDB, then the very last restore in the sequence must restore both data pages and catalog data. Otherwise, LucidDB will be inaccessible until the catalog data is restored. By bypassing the restore of catalog data for the earlier backups in the restore sequence, that will speed up the total time required to complete the entire restore.
Restore will grow db.dat as necessary to bring it back to the same size as of backup time, but better performance can be achieved by using the LucidDB storage preallocation utility to do this before bringing the server up for restore. Restore will never shrink db.dat, and does not precheck the available disk space if db.dat needs to grow.
No other users can be connected to LucidDB while a restore is in-progress, and the restore session cannot have a label setting.
The restore will fail (leaving the database unmodified) if the expected reserved filenames do not exist in the archive directory.
For a restore from an incremental or differential backup, the restore will fail (leaving the database unmodified) if the commit sequence number recorded with the backup does not match the commit sequence number recorded as "previous" or "previous full" (respectively) at the time the backup was taken. Therefore, it's important that in between any two restores in a sequence, no SQL statements other than the restore procedure calls (or read-only queries) are issued.
If a restore request fails in the middle (e.g. due to an I/O error or due to a user cancel request), the database may be left in an unusable state, in which case, it is necessary to start over from the beginning of a full restore sequence.
LucidDB must be restarted after each restore completes.