Go to the first, previous, next, last section, table of contents.


Performing Backups and Restoring Files

@UNREVISED

GNU tar is distributed along with the scripts which the Free Software Foundation uses for performing backups. There is no corresponding scripts available yet for doing restoration of files. Even if there is a good chance those scripts may be satisfying to you, they are not the only scripts or methods available for doing backups and restore. You may well create your own, or use more sophisticated packages dedicated to that purpose.

Some users are enthusiastic about Amanda (The Advanced Maryland Automatic Network Disk Archiver), a backup system developed by James da Silva `[email protected]' and available on many Unix systems. This is free software, and it is available at these places:

http://www.cs.umd.edu/projects/amanda/amanda.html
ftp://ftp.cs.umd.edu/pub/amanda

Here is a possible plan for a future documentation about the backuping scripts which are provided within the GNU tar distribution.

.* dumps
. + what are dumps

. + different levels of dumps
.  - full dump = dump everything
.  - level 1, level 2 dumps etc, -
	A level n dump dumps everything changed since the last level
	n-1 dump (?)

. + how to use scripts for dumps  (ie, the concept)
.  - scripts to run after editing backup specs (details)

. + Backup Specs, what is it.
.  - how to customize
.  - actual text of script  [/sp/dump/backup-specs]

. + Problems
.  - rsh doesn't work
.  - rtape isn't installed
.  - (others?)

. + the --incremental option of tar

. + tapes
.  - write protection
.  - types of media
.   : different sizes and types, useful for different things
.  - files and tape marks
     one tape mark between files, two at end.
.  - positioning the tape
     MT writes two at end of write,
       backspaces over one when writing again.

This chapter documents both the provided FSF scripts and tar options which are more specific to usage as a backup tool.

To back up a file system means to create archives that contain all the files in that file system. Those archives can then be used to restore any or all of those files (for instance if a disk crashes or a file is accidently deleted). File system backups are also called dumps.

Using tar to Perform Full Dumps

@UNREVISED

Full dumps should only be made when no other people or programs are modifying files in the filesystem. If files are modified while tar is making the backup, they may not be stored properly in the archive, in which case you won't be able to restore them if you have to. (Files not being modified are written with no trouble, and do not corrupt the entire archive.)

You will want to use the --label=archive-label (-V archive-label) option to give the archive a volume label, so you can tell what this archive is even if the label falls off the tape, or anything like that.

Unless the filesystem you are dumping is guaranteed to fit on one volume, you will need to use the --multi-volume (-M) option. Make sure you have enough tapes on hand to complete the backup.

If you want to dump each filesystem separately you will need to use the --one-file-system (-l) option to prevent tar from crossing filesystem boundaries when storing (sub)directories.

The --incremental (-G) option is not needed, since this is a complete copy of everything in the filesystem, and a full restore from this backup would only be done onto a completely empty disk.

Unless you are in a hurry, and trust the tar program (and your tapes), it is a good idea to use the --verify (-W) option, to make sure your files really made it onto the dump properly. This will also detect cases where the file was modified while (or just after) it was being archived. Not all media (notably cartridge tapes) are capable of being verified, unfortunately.

--listed-incremental=snapshot-file (-g snapshot-file) take a file name argument always. If the file doesn't exist, run a level zero dump, creating the file. If the file exists, uses that file to see what has changed.

--incremental (-G) @FIXME{look it up}

--incremental (-G) handle old GNU-format incremental backup.

This option should only be used when creating an incremental backup of a filesystem. When the --incremental (-G) option is used, tar writes, at the beginning of the archive, an entry for each of the directories that will be operated on. The entry for a directory includes a list of all the files in the directory at the time the dump was done, and a flag for each file indicating whether the file is going to be put in the archive. This information is used when doing a complete incremental restore.

Note that this option causes tar to create a non-standard archive that may not be readable by non-GNU versions of the tar program.

The --incremental (-G) option means the archive is an incremental backup. Its meaning depends on the command that it modifies.

If the --incremental (-G) option is used with --list (-t), tar will list, for each directory in the archive, the list of files in that directory at the time the archive was created. This information is put out in a format that is not easy for humans to read, but which is unambiguous for a program: each file name is preceded by either a `Y' if the file is present in the archive, an `N' if the file is not included in the archive, or a `D' if the file is a directory (and is included in the archive). Each file name is terminated by a null character. The last file is followed by an additional null and a newline to indicate the end of the data.

If the --incremental (-G) option is used with --extract (--get, -x), then when the entry for a directory is found, all files that currently exist in that directory but are not listed in the archive are deleted from the directory.

This behavior is convenient when you are restoring a damaged file system from a succession of incremental backups: it restores the entire state of the file system to that which obtained when the backup was made. If you don't use --incremental (-G), the file system will probably fill up with files that shouldn't exist any more.

--listed-incremental=snapshot-file (-g snapshot-file) handle new GNU-format incremental backup. This option handles new GNU-format incremental backup. It has much the same effect as --incremental (-G), but also the time when the dump is done and the list of directories dumped is written to the given file. When restoring, only files newer than the saved time are restored, and the direcotyr list is used to speed up operations.

--listed-incremental=snapshot-file (-g snapshot-file) acts like --incremental (-G), but when used in conjunction with --create (-c) will also cause tar to use the file file, which contains information about the state of the filesystem at the time of the last backup, to decide which files to include in the archive being created. That file will then be updated by tar. If the file file does not exist when this option is specified, tar will create it, and include all appropriate files in the archive.

The file, which is archive independent, contains the date it was last modified and a list of devices, inode numbers and directory names. tar will archive files with newer mod dates or inode change times, and directories with an unchanged inode number and device but a changed directory name. The file is updated after the files to be archived are determined, but before the new archive is actually created.

GNU tar actually writes the file twice: once before the data and written, and once after.

Using tar to Perform Incremental Dumps

@UNREVISED

Performing incremental dumps is similar to performing full dumps, although a few more options will usually be needed.

You will need to use the `-N date' option to tell tar to only store files that have been modified since date. date should be the date and time of the last full/incremental dump.

A standard scheme is to do a monthly (full) dump once a month, a weekly dump once a week of everything since the last monthly and a daily every day of everything since the last (weekly or monthly) dump.

Here is a copy of the script used to dump the filesystems of the machines here at the Free Software Foundation. This script is run via cron late at night when people are least likely to be using the machines. This script dumps several filesystems from several machines at once (via NFS). The operator is responsible for ensuring that all the machines will be up at the time the dump happens. If a machine is not running, its files will not be dumped, and the next day's incremental dump will not store files that would have gone onto that dump.

#!/bin/csh
# Dump thingie
set now = `date`
set then = `cat date.nfs.dump`
/u/hack/bin/tar -c -G -v\
 -f /dev/rtu20\
 -b 126\
 -N "$then"\
 -V "Dump from $then to $now"\
 /alpha-bits/gp\
 /gnu/hack\
 /hobbes/u\
 /spiff/u\
 /sugar-bombs/u
echo $now > date.nfs.dump
mt -f /dev/rtu20 rew

Output from this script is stored in a file, for the operator to read later.

This script uses the file `date.nfs.dump' to store the date/time of the last dump.

Since this is a streaming tape drive, no attempt to verify the archive is done. This is also why the high blocking factor (126) is used. The tape drive must also be rewound by the mt command after the dump is made.

The Incremental Options

@UNREVISED

--incremental (-G) is used in conjunction with --create (-c), --extract (--get, -x) or --list (-t) when backing up and restoring file systems. An archive cannot be extracted or listed with the --incremental (-G) option specified unless it was created with the option specified. This option should only be used by a script, not by the user, and is usually disregarded in favor of --listed-incremental=snapshot-file (-g snapshot-file), which is described below.

--incremental (-G) in conjunction with --create (-c) causes tar to write, at the beginning of the archive, an entry for each of the directories that will be archived. The entry for a directory includes a list of all the files in the directory at the time the archive was created and a flag for each file indicating whether or not the file is going to be put in the archive.

Note that this option causes tar to create a non-standard archive that may not be readable by non-GNU versions of the tar program.

--incremental (-G) in conjunction with --extract (--get, -x) causes tar to read the lists of directory contents previously stored in the archive, delete files in the file system that did not exist in their directories when the archive was created, and then extract the files in the archive.

This behavior is convenient when restoring a damaged file system from a succession of incremental backups: it restores the entire state of the file system to that which obtained when the backup was made. If --incremental (-G) isn't specified, the file system will probably fill up with files that shouldn't exist any more.

--incremental (-G) in conjunction with --list (-t), causes tar to print, for each directory in the archive, the list of files in that directory at the time the archive was created. This information is put out in a format that is not easy for humans to read, but which is unambiguous for a program: each file name is preceded by either a `Y' if the file is present in the archive, an `N' if the file is not included in the archive, or a `D' if the file is a directory (and is included in the archive). Each file name is terminated by a null character. The last file is followed by an additional null and a newline to indicate the end of the data.

--listed-incremental=snapshot-file (-g snapshot-file) acts like --incremental (-G), but when used in conjunction with --create (-c) will also cause tar to use the file snapshot-file, which contains information about the state of the file system at the time of the last backup, to decide which files to include in the archive being created. That file will then be updated by tar. If the file file does not exist when this option is specified, tar will create it, and include all appropriate files in the archive.

The file file, which is archive independent, contains the date it was last modified and a list of devices, inode numbers and directory names. tar will archive files with newer mod dates or inode change times, and directories with an unchanged inode number and device but a changed directory name. The file is updated after the files to be archived are determined, but before the new archive is actually created.

Despite it should be obvious that a device has a non-volatile value, NFS devices have non-dependable values when an automounter gets in the picture. This led to a great deal of spurious redumping in incremental dumps, so it is somewhat useless to compare two NFS devices numbers over time. So tar now considers all NFS devices as being equal when it comes to comparing directories; this is fairly gross, but there does not seem to be a better way to go.

@FIXME{this section needs to be written}

Levels of Backups

@UNREVISED

An archive containing all the files in the file system is called a full backup or full dump. You could insure your data by creating a full dump every day. This strategy, however, would waste a substantial amount of archive media and user time, as unchanged files are daily re-archived.

It is more efficient to do a full dump only occasionally. To back up files between full dumps, you can a incremental dump. A level one dump archives all the files that have changed since the last full dump.

A typical dump strategy would be to perform a full dump once a week, and a level one dump once a day. This means some versions of files will in fact be archived more than once, but this dump strategy makes it possible to restore a file system to within one day of accuracy by only extracting two archives--the last weekly (full) dump and the last daily (level one) dump. The only information lost would be in files changed or created since the last daily backup. (Doing dumps more than once a day is usually not worth the trouble).

GNU tar comes with scripts you can use to do full and level-one dumps. Using scripts (shell programs) to perform backups and restoration is a convenient and reliable alternative to typing out file name lists and tar commands by hand.

Before you use these scripts, you need to edit the file `backup-specs', which specifies parameters used by the backup scripts and by the restore script. @FIXME{There is no such restore script!}. @FIXME-xref{Script Syntax}. Once the backup parameters are set, you can perform backups or restoration by running the appropriate script.

The name of the restore script is restore. @FIXME{There is no such restore script!}. The names of the level one and full backup scripts are, respectively, level-1 and level-0. The level-0 script also exists under the name weekly, and the level-1 under the name daily---these additional names can be changed according to your backup schedule. @FIXME-xref{Scripted Restoration}, for more information on running the restoration script. @FIXME-xref{Scripted Backups}, for more information on running the backup scripts.

Please Note: The backup scripts and the restoration scripts are designed to be used together. While it is possible to restore files by hand from an archive which was created using a backup script, and to create an archive by hand which could then be extracted using the restore script, it is easier to use the scripts. @FIXME{There is no such restore script!}. See section Using tar to Perform Incremental Dumps, and See section Using tar to Perform Incremental Dumps, before making such an attempt.

@FIXME{shorten node names}

Setting Parameters for Backups and Restoration

@UNREVISED

The file `backup-specs' specifies backup parameters for the backup and restoration scripts provided with tar. You must edit `backup-specs' to fit your system configuration and schedule before using these scripts.

@FIXME{This about backup scripts needs to be written: BS is a shell script .... thus ... `backup-specs' is in shell script syntax.}

@FIXME-xref{Script Syntax}, for an explanation of this syntax.

@FIXME{Whats a parameter .... looked at by the backup scripts ... which will be expecting to find ... now syntax ... value is linked to lame ... `backup-specs' specifies the following parameters:}

`ADMINISTRATOR'
The user name of the backup administrator.
`BACKUP_HOUR'
The hour at which the backups are done. This can be a number from 0 to 23, or the string `now'.
`TAPE_FILE'
The device tar writes the archive to. This device should be attached to the host on which the dump scripts are run. @FIXME{examples for all ...}
`TAPE_STATUS'
The command to use to obtain the status of the archive device, including error count. On some tape drives there may not be such a command; in that case, simply use `TAPE_STATUS=false'.
`BLOCKING'
The blocking factor tar will use when writing the dump archive. See section The Blocking Factor of an Archive.
`BACKUP_DIRS'
A list of file systems to be dumped. You can include any directory name in the list--subdirectories on that file system will be included, regardless of how they may look to other networked machines. Subdirectories on other file systems will be ignored. The host name specifies which host to run tar on, and should normally be the host that actually contains the file system. However, the host machine must have GNU tar installed, and must be able to access the directory containing the backup scripts and their support files using the same file name that is used on the machine where the scripts are run (ie. what pwd will print when in that directory on that machine). If the host that contains the file system does not have this capability, you can specify another host as long as it can access the file system through NFS.
`BACKUP_FILES'
A list of individual files to be dumped. These should be accessible from the machine on which the backup script is run. @FIXME{Same file name, be specific. Through NFS ...}

An Example Text of `Backup-specs'

@UNREVISED

The following is the text of `backup-specs' as it appears at FSF:

# site-specific parameters for file system backup.

ADMINISTRATOR=friedman
BACKUP_HOUR=1
TAPE_FILE=/dev/nrsmt0
TAPE_STATUS="mts -t $TAPE_FILE"
BLOCKING=124
BACKUP_DIRS="
	albert:/fs/fsf
	apple-gunkies:/gd
	albert:/fs/gd2
	albert:/fs/gp
	geech:/usr/jla
	churchy:/usr/roland
	albert:/
	albert:/usr
	apple-gunkies:/
	apple-gunkies:/usr
	gnu:/hack
	gnu:/u
	apple-gunkies:/com/mailer/gnu
	apple-gunkies:/com/archive/gnu"

BACKUP_FILES="/com/mailer/aliases /com/mailer/league*[a-z]"

Syntax for `Backup-specs'

@UNREVISED

`backup-specs' is in shell script syntax. The following conventions should be considered when editing the script: @FIXME{"conventions?"}

A quoted string is considered to be contiguous, even if it is on more than one line. Therefore, you cannot include commented-out lines within a multi-line quoted string. BACKUP_FILES and BACKUP_DIRS are the two most likely parameters to be multi-line.

A quoted string typically cannot contain wildcards. In `backup-specs', however, the parameters BACKUP_DIRS and BACKUP_FILES can contain wildcards.

Using the Backup Scripts

@UNREVISED

The syntax for running a backup script is:

`script-name' [time-to-be-run]

where time-to-be-run can be a specific system time, or can be now. If you do not specify a time, the script runs at the time specified in `backup-specs' (@FIXME-pxref{Script Syntax}).

You should start a script with a tape or disk mounted. Once you start a script, it prompts you for new tapes or disks as it needs them. Media volumes don't have to correspond to archive files--a multi-volume archive can be started in the middle of a tape that already contains the end of another multi-volume archive. The restore script prompts for media by its archive volume, so to avoid an error message you should keep track of which tape (or disk) contains which volume of the archive. @FIXME{There is no such restore script!}. @FIXME-xref{Scripted Restoration}. @FIXME{Have file names changed?}

The backup scripts write two files on the file system. The first is a record file in `/etc/tar-backup/', which is used by the scripts to store and retrieve information about which files were dumped. This file is not meant to be read by humans, and should not be deleted by them. @FIXME-xref{incremental and listed-incremental}, for a more detailed explanation of this file.

The second file is a log file containing the names of the file systems and files dumped, what time the backup was made, and any error messages that were generated, as well as how much space was left in the media volume after the last volume of the archive was written. You should check this log file after every backup. The file name is `log-mmm-ddd-yyyy-level-1' or `log-mmm-ddd-yyyy-full'.

The script also prints the name of each system being dumped to the standard output.

Using the Restore Script

@UNREVISED

Warning: The GNU tar distribution does not provide any such restore script yet. This section is only listed here for documentation maintenance purposes. In any case, all contents is subject to change as things develop.

@FIXME{A section on non-scripted restore may be a good idea.}

To restore files that were archived using a scripted backup, use the restore script. The syntax for the script is:

where ***** are the file systems to restore from, and ***** is a regular expression which specifies which files to restore. If you specify --all, the script restores all the files in the file system.

You should start the restore script with the media containing the first volume of the archive mounted. The script will prompt for other volumes as they are needed. If the archive is on tape, you don't need to rewind the tape to to its beginning--if the tape head is positioned past the beginning of the archive, the script will rewind the tape as needed. @FIXME-xref{Media}, for a discussion of tape positioning.

If you specify `--all' as the files argument, the restore script extracts all the files in the archived file system into the active file system.

Warning: The script will delete files from the active file system if they were not in the file system when the archive was made.

See section Using tar to Perform Incremental Dumps, and section Using tar to Perform Incremental Dumps, for an explanation of how the script makes that determination.

@FIXME{this may be an option, not a given}


Go to the first, previous, next, last section, table of contents.