Online Backup Functional/Design Spec

 

Overview

 

Support real-time online backup mechanism that does not block database operations when backup is in progress. When backup is running, currently Derby only permits reads from the database. By adding support to permit writes, it will be easier for the users to schedule backups without being concerned about the impact of backup operation on the applications that are accessing the database.

 

Introduction

 

New online backup mechanism will not change the way database can be restored from the backup. In the current system, if there any unlogged operations when backup starts, they will be rolled back on restore because database is frozen at the start of the backup. To support real-time online backups that do not block user operations, it is necessary to have the transaction log for all the operations that happen after the backup starts to bring the database to the consistent state during restore.  Because a transaction that has unlogged operations can commit when backup is in progress. To make a consistent backup of the database, there should not be any pending transactions with unlogged operations.

 

JDBC

 

Backup operations are support through system procedures. They can be called using JDBC program or through IJ.

Existing Backup Procedures:

After this enhancement, existing backup procedures (SYSCS_UTIL.SYSCS_BACKUP_DATABASE  ...etc) will allow writes to the database. Information about existing online backup procedures can be found in the Derby admin guide. Derby Admin Guide .    These backup procedures will throw error, if there are any unlogged operations already executed in the same transaction as backup procedure.  If there are any unlogged operations in progress in other transactions in the system when backup starts, these procedures will block until those transactions are completed before performing the backup.  Derby automatically converts unlogged operations to logged mode if they are started when the backup is in progress except operations that are used to maintain application jar files in the database. Procedures to install, replace, and remove jar files in a database are blocked when the backup is in progress.

 

Two new procedures are added to make backup return error immediately if there are any transactions in progress with unlogged operations at the start of the backup instead of waiting for those transactions to complete.  

 

 

Full backup:

SYSCS_UTIL.SYSCS_ONLINE_BACKUP_DATABASE_NOWAIT(IN BACKUPDIR VARCHAR())

 
Full backup with log archive mode:

SYSCS_UTIL.SYSCS_BACKUP_DATABASE_AND_ENABLE_LOG_ARCHIVE_MODE_NOWAIT
     (IN BACKUPDIR VARCHAR(32672),

      IN SMALLINT DELETE_ARCHIVED_LOG_FILES)  


Arguments

 

BACKUPDIR:

An input argument of type VARCHAR(32672) that specifies the full system path to the database directory to be backed up.

 

DELETE_ARCHIVED_LOG_FILES:

A non-zero input argument of type SMALLINT will make backup procedures to delete existing online archived log files that were created before this backup will be deleted. The log files are deleted only after a successful backup.

 

 

Design

 

Current Online backup copies all the database files (log, seg0  ...Etc) to the specified location when the database is frozen (writes are blocked).

In the new scheme, goal is to copy the database information to the backup location, without blocking any user operation for the whole duration of the backup.  Following sections discuss how the information is copied into the backup.

 

Containers:

 

At the start of the backup, all the containers that need to be backed up are found by finding all the files in the data segment (seg0). Then each container is copied to the backup by reading through the page cache instead of the direct file copy.

 

·        Individual pages are latched, when they are being copied to the backup to prevent modifications to the page.

·        Any new containers created when backup is in progress will be recreated on restore using the information from the transaction log.

·        Containers that were in the process of creation will not be backed up, if the creation is not complete. Container cache will make sure that backup can not get handle to  the containers that are being created. 

·        If committed drop occurs on a container that is being backed up, backup will wait for the committed drop to complete and then copy the stub to the backup.

·        Committed container stubs that are created after the checkpoint for the backup will not be deleted until the backup is complete. A checkpoint that occurs after the backup garbage collects them.

·         All requests to open containers in unlogged mode will be automatically converted to logged mode after the backup starts.

  

Log files:

Transaction log is used to bring the database to the consistent state on restore from the backup. A checkpoint is made at the start of the backup; this checkpoint will be starting point for the recovery from the backup. Log files are copied to the backup using regular file copy. 

 

  • Backup Checkpoint information in the log control files is copied to the backup location at the start.
  • All  the  transaction log  files  starting  from  the log  file  that  contains Undo Low Water Mark of  the backup checkpoint to the log file that is active at the end of the backup are copied into the backup.
  • All the transaction log files starting from the backup checkpoint to the log file that is active at the end of the backup are copied into the backup.
  • State of the database in the backup depends on last log file that goes into the back up.  After all the containers are copied to the backup, last log file that goes into the backup is determined. 
  • Database engine wills not delete the log files that are to yet to be backed up, when backup is running even if they are no longer required for regular crash recovery. 
  • Checkpoints can occur in parallel when the backup is progress.  

 

Jar files:

 

Storing Jar files in the database is not a logged operation and they don’t go through the page cache, they will be copied to the backup directly using the file copy.  Jar files are not logged, but the system catalogs  updates are logged  when a jar file is added/replaced. If jar file operations are allowed during the backup, it is possible that backup system catalog (sys.sysfiles) will have a reference to a file that does not exist in the backup database.  And also backup should not copy partial written jar files.  To get consistent backup copy:

  • Backup operation will wait/fail for all the jar file activity in progress to complete.
  • Jar file operations are blocked when backup is in progress.

 

Other files:

  • Database bootstrap file Service.Properties will be to be copied to the backup.  Need to check if any synchronization is required to copy this file.
  • Copy the BACKUP.HISTORY file at the end of the backup.
  •  Lock files should not be copied to the backup.
  • Tmp directory should not be backed up.

 

 

Testing:

Most of the current backup tests are single threaded. I will add new functional tests case that will execute read/writes on the database, when backup is running.  Some of the testing scenarios:

  1. Test the new procedure calls.          
  2. Checkpoint in parallel to the backup.
  3. DML operations in parallel to the backup.
  4. Grow the container after it is backed up / parallel.
  5. Unlogged operations in parallel to the backup and pending when backup starts. 
  6. DDL operations in parallel to the backup.
  7. In-Place compress in parallel to the backup.
  8. ….etc.

 

 

  References:

 

·         Derby Admin Guide Backup Section

·         Derby-Dev Thread

 

 

 

·        Version 1. 10/5/2005 :  suresh.thalamati@gmail.com

·        Version 2. 11/30/2005 :  suresh.thalamati@gmail.com

·        Version 3. 01/31/2005 :  suresh.thalamati@gmail.com