It is 3 a.m. Do you know where your data is?
The answer depends on the type of business that your Oracle RDBMS supports. Some systems, such as data warehouses and other types of decision support systems (DSS), sustain little overall activity during the day but perform database-intensive table
reloading operations during off-peak hours. OLTP systems typically reserve long-running query reports or resource-intensive batch jobs for the night hours.
Of course, there is not a right or wrong answer. Every site is a unique environment and has its own business requirements. However, it is important to know what is happening on an Oracle database during typical operations so that you know how to react
when the telephone rings in the middle of the night and a voice on the other end informs you that a disaster has occurred. The disaster can be anything from a database instance that has abnormally terminated to a physical disk drive failure. If you have
not already planned a formal backup and recovery procedure, it would be too late to start.
Backup and recovery are among most important considerations on any system, yet they generally receive less than one percent of the total planning, developing, and testing effort. Imagine that you are a project leader. Would you allow a developer who is
responsible for mission-critical corporate applications to read a few articles, kludge together a functional program, and deploy it in a production environment with only cursory testing? This is precisely the approach that most DBAs take with backup and
recovery procedures.
In their defense, though, backups for the most part seem to work. Scripts are run, copies are made to tape or disk, and someone dutifully checks to make sure that nothing abnormal has happened. No one knows whether this works until the system is down
and the backup is needed to restore production operations. In many cases, there are political conflicts over whose responsibility it is to make backups. At some sites, the system administrator is responsible. Other sites make each individual administrator
responsible for his own database or system. A few sites rely on an operations group that assumes responsibility for backups.
In the past, when the economy was centered primarily on agriculture or manufacturing, businesses were concerned especially about production and materials. As the Information Age becomes a reality, backups become important because they safeguard the only
thing that a business cannot replacedata. Data takes many different formscustomer names, part numbers, purchase orders, and so on. When businesses were mostly independent, regional companies, the loss of data was less significant. For a
multinational corporation today, however, the loss of even a single hour's data can result in thousands of dollars of damage.
In the event of a catastrophe, it is imperative that the data be recovered. Two factors are critical for success: accuracy and timeliness.
If a backup does not properly record all the information necessary to recover the database, it is worthless. A backup that fails to provide a critical piece of information, such as a required database file, cannot be used to restore operations. It is
essential, therefore, that all the necessary database components be part of scheduled backups. Likewise, recovery of a mission-critical database that takes several days is, in most cases, unacceptable. A backup must expeditiously and completely restore the
database after a failure occurs.
At the functional level, Oracle is simply a collection of physical data files that reside on one or more hard drives, such as the hard drive on a PC, a midrange system, or a mainframe. Thus, backing up the system becomes a matter of knowing which data
files to copy to tape or disk through standard copy utilities.
Because of the large amount of processing done by the background processes that comprise an Oracle database instance, making copies of these files gets complicated. To satisfy data consistency requirements, Oracle constantly updates its database files.
Most operating systems copy files in blocks, so the snapshot taken of a file when it is copied may not be the same snapshot that it finishes with (see Figure 14.1).
Figure 14.1. Operating system backup.
In Figure 14.1, the file FOOBAR is 20,000 bytes (a little less than 20K). Assuming that the operating system copies in 2,048-byte (2K) blocks, it can successfully copy the file in 10 packets. Compare this with Figure 14.2.
Figure 14.2. Operating system backup of Oracle data file.
In Figure 14.2, the USERS tablespace is composed of two physical data files, each of which is 2M (2,048K) in size. It takes the same operating system utility 512 separate packets to copy each file. During this time, the Oracle RDBMS continues to update
the data files. Because of this, the snapshot of the database file taken by the utility is corrupt, incorrect, and unusable. Obviously, more is involved in making backups than just performing a copy with an operating system utility.
Oracle maintains information concerning changes, or transactions, in online redo logs. These online redo logsoften called simply redo logsare reminiscent of audit trails of information. They specify what changes were made to
information within the database and in what order. In the event of an abnormal condition, such as the database instance being terminated instead of being properly shut down, the RDBMS uses information stored within the redo logs to return the database to
its previous, consistent state. It uses the redo logs cyclically, as shown in Figure 14.3.
Figure 14.3. Oracle redo logs.
Figure 14.3 shows four 10M redo logs. This means that the database instance can write up to 10M of information to each redo log, at which point it begins writing to the next online redo log in the sequence. When it reaches the final redo log, the
database instance removes all the information from the first redo log and begins writing information back to it. In this way, Oracle cycles through the redo logs and keeps the most current transactions online and accessible. A recovery from an instance
failure that uses only redo logs is called an online recovery.
Consider this disaster scenario: You encounter an internal error that causes a running Oracle RDBMS instance to terminate abnormally. While attempting to bring up the database, you discover that a media failure has occurred on the disk that contains the
SYSTEM tablespace. The last backup of the system occurred over four days ago, and the redo logs have cycled several times since then. Oracle cannot do the necessary recovery from its redo logs to mount and open the database. What do you do?
Although extreme, scenarios like this one aided Oracle in developing online redo log archiving. All Oracle instances, by default or by design, run in ARCHIVELOG mode or in NOARCHIVELOG mode. (See Figure 14.4.) Each mode has its advantages and
disadvantages.
A database running in ARCHIVELOG mode functions exactly like one in NOARCHIVELOG mode. The only exception occurs when the database completely fills one redo log and begins to write information to the next. If the redo log contains information from a
previous cyclewhich is true in all cases except the first cyclean offline copy of the redo log is made. Once this copy is made, the redo log is emptied and the database resumes its normal processing by writing information to the redo log. This
enables you to recover a database by using the last backup even if it precedes the earliest information in the redo logs. Note, however, that the offline redo logs must be physically accessible to the RDBMS.
Offline redo logs, commonly called archive logs, are simple in concept, but they involve many overhead considerations, which the administrator must decide prior to implementation. Some concerns, such as the volume of transactions processed by the
database, affect others, such as how large to make the archive destination and whether to use automatic or manual archiving.
By default, the Oracle RDBMS instance runs in NOARCHIVELOG mode. To determine what mode a database instance is currently running in, issue the archivelog list command from within the Oracle Server*Manager. For example,
% svrmgrl SVRMGR> connect internal Connected. SVRMGR> archive log list Database log mode No Archive Mode Automatic Archival Disabled Automatic Destination $ORACLE_BASE/admin//norm/arch/arch.log Oldest online log sequence 2088 Current log sequence 2093
To set the archive mode of a database, the database must be mounted but not open. Once the database is in this state, the DBA needs only to issue the alter database archivelog or alter database noarchivelog command from Oracle Server*Manager. In the
following code, the DBA mounts a database that has been shutdown normally, changes the database to ARCHIVELOG mode, and completes the startup by opening the database:
% svrmgrl SVRMGR> connect internal Connected to an idle instance SVRMGR> startup mount ORACLE instance started Total System Global Area 95243632 bytes Fixed Size 46384 bytes Variable Size 70588480 bytes Database Buffers 24576000 bytes Redo Buffers 32768 bytes Database mounted. SVRMGR> alter database archivelog; Statement processed. SVRMGR> alter database open; Statement processed.
The database runs in ARCHIVELOG mode until the DBA disables it. Even abnormal termination of the database instance or an instance shutdown or startup does not take the database out of ARCHIVELOG mode. This is because the information about whether the
database is in ARCHIVELOG mode is stored in the instance's control files, along with other crucial database information.
Parameters within the INIT.ORA parameter file control various aspects of the archive process. They are
LOG_ARCHIVE_BUFFER_SIZE
LOG_ARCHIVE_BUFFERS
LOG_ARCHIVE_DEST
LOG_ARCHIVE_FORMAT
LOG_ARCHIVE_START
The LOG_ARCHIVE_BUFFER_SIZE and LOG_ARCHIVE_BUFFERS parameters are useful primarily in database tuning.
The LOG_ARCHIVE_DEST parameter specifies the output location (such as /var/offline) or device (such as /dev/rmt/0hc) where the archive logs will be written. It must include a filename as part of the parameterfor example, /u10/admin/norm/arch/redo.
LOG_ARCHIVE_FORMAT specifies the format, or mask, used when writing archive logs to the location specified in LOG_ARCHIVE_DEST. Here is a brief list:
%s |
Log sequence number |
%S |
Log sequence number (zero padded) |
%t |
Thread number |
%T |
Thread number (zero padded) |
The value of LOG_ARCHIVE_START is either TRUE or FALSE. A value of TRUE indicates that automatic archiving should be used, while FALSE indicates manual archiving.
The values of these parameters vary among operating systems and environments. The DBA should configure the settings so that they best suit the database environment.
If archiving is not successful, the database suspends further operations, including SELECT and CONNECT, until the DBA takes corrective action. The rationale for this is simple: Because all the data is needed for recovery in ARCHIVELOG mode, Oracle stops
operating until it can successfully retain the data. It is important, therefore, to address the issues of the size and availability of the archive destination before you place a database in ARCHIVELOG mode.
If users report errors from the database that deal with archiving, one of the quickest and best sources of information on the error is the ALERT.LOG file, whose location is indicated by the BACKGROUND_DUMP_DEST parameter in the INIT.ORA parameter file.
The ALERT.LOG file records all major activity within the database. The following code contains several lines extracted from an actual ALERT.LOG file that a DBA might see in the event of a problem:
Beginning database checkpoint by background Thread 1 advanced to log sequence 1760 Current log# 1 seq# 1760 mem# 0: /u09/oradata/norm/redolb.log Current log # 1 seq# 1760 mem# 0: /u16/oradata/norm/redo1c.log Thu Jun 8 10:21:57 1995 ARCH: Archival stopped, error occurred. Will continue retrying Thu Jun 8 10:21:57 1995 ORACLE instance norm - Archival Error Thu Jun 8 10:21:57 1995 ORA-00255: error archivelog log 2 of thread 1, sequence # 1759 ORA-00312: online log 2 thread 1: '/u09/oradata/norm/redo02a.log' ORA-00312: online log 2 thread 1: '/u16/oradata/norm/redo02b.log' ORA-00272: error writing archive log ARCH: ORA-00255: error archiving log 2 of thread 1, sequence # 1759 ORA-00312: online log 2 thread 1: '/u09/oradata/norm/redo2a.log' ORA-00312: online log 2 thread 1: '/u16/oradata/norm/redo2b.log' ORA-00272: error writing archive log Thu Jun 8 10:22:07 1995 Completed database checkpoint by background Thu Jun 8 10:24:45 1995 Beginning database checkpoint by background
Common problems that you might encounter when you work with archiving include
Conflicting archive log names are rare. They generally occur when two or more instances write to the same file system and directory with the same naming convention.
If you set the value of the LOG_ARCHIVE_START parameter to FALSEthe default valuethe Oracle RDBMS instance will use manual archiving. Manual archiving places the control of, and the responsibility for, archiving redo logs in the DBA's hands.
When a database runs under manual archiving, it runs unfettered until a redo log must be archived. When that occurs, the database performs no further activity until the DBA intervenes and issues the alter system archive log all command from Oracle
Server*Manager or SQL*Plus. For example,
% svrmgrl SVRMGR> connect internal Connected. SVRMGR> alter system archive log all; Statement processed.
Users cannot perform any operations until the DBA takes the steps necessary to archive the redo logs manually. Therefore, adequate planning is a must for database instances that use manual archiving.
Automatic archiving works in the same way as manual archiving, except that the database takes full responsibility for copying the archive logs to their appropriate destinations. If an error occurs during the copyfor example, a device might fail or
the file system might become fullthe database stops all operations until the problem is rectified. To place the database in automatic archive mode, set the value of the LOG_ARCHIVE_START parameter to TRUE.
Automatic archiving is recommended for high transaction, high availability systems, such as those used by OLTP, that cannot afford to have a DBA poised and ready to archive redo logs manually at a moment's notice.
In manual archiving,
In automatic archiving,
NOARCHIVELOG mode is the default mode used by an Oracle RDBMS instance. In this mode, no archival of the redo logs is made, and no special handling is required. However, the DBA needs to guard against disaster scenarios like the one described earlier.
Usually, the only databases run in NOARCHIVELOG mode are those that do not have a high volume of transactions, such as decision support databases that contain only summarized information from other applications systems. It is important to ensure that
backups are made more often than the redo logs are cycled.
In ARCHIVELOG mode,
In NOARCHIVELOG mode,
In addition to ensuring that the required transactional information is available for recovery, you must make routine backups of the database. The backup procedure for an RDBMS is more complicated than simply making copies at the operating system level.
In The Wizard of Oz, the Wizard tells Dorothy, ÒPay no attention to that man behind the curtain.Ó The same admonishment applies to the overall functionality of the Oracle RDBMS instance.
A database is simply a collection of physical data files. The RDBMS provides a sophisticated set of programs that hide the details of the processing from the world and that enable programmers and users to view this data as tables, views, indexes, and
clusters. The truth, though, is that Oracle, like the early flat-file databases that preceded it, does nothing more extravagant than store information in files. The man behind the curtain stands revealed.
Well, almost.
Although all the information stored by the Oracle RDBMS resides in physical data files, the information is accessible only through the tools and utilities provided by Oracle. To attempt otherwise is to risk corrupting the data within the file. There are
also various kinds of files, each of which serves a different purpose and must be backed up in a specific manner. It is important to ensure that all the right files are backed up when you implement a backup strategy. Make sure that you understand the
reasons why each file is backed up.
Figure 14.5 shows the physical files that an Oracle instance uses. They are control files, database files, and redo log files. They must be included in the backup strategy.
Figure 14.5. The physical files the make up an Oracle instance.
Of all files used by the Oracle RDBMS, the control file most closely resembles the flat file of earlier databases, and each Oracle instance must contain at least one. The control file is a treasure trove of information, a great deal of which is
proprietary and quite illegible. The following information ÒofficiallyÓ exists within the control file:
Oracle users use the SCN to maintain the consistency and integrity of their databases.
The value of the control file is obvious. Without it, the database instance does not know what SCN to use, which makes it unable to tell whether all the information is in sync. The database cannot tell which database files and redo log files to mount
and open as part of the database. It even suffers Òdatabase amnesiaÓ because it does not know its own name. The control file is continuously updated while the database operates.
The database files are the heart and soul of the database instance. They are the physical files that make up tablespaceshe logical constructs on which tables, indexes, and the like reside. Each tablespace can be made up of one or more separate
physical database files. Every database instance must contain at least one database filefor the SYSTEM tablespace.
Database files tend to be the hardest type of files to make copies of. This is because the data within a database objectwhich lies within a tablespace and, ultimately, within a database file is constantly updated and changed. If a database
has an active user community, many INSERT, UPDATE, or DELETE operations might running that can change the data even as it is being copied. Despite the use of the SCN, the data changes and is inconsistent with the information that the database believes is
stored within the database file (refer to Figure 14.2).
The redo log files hold transactions that have been applied against data within the database, including INSERT, DELETE, UPDATE, CREATE TABLE, DROP TABLE, CREATE INDEX, and DROP INDEX. Redo log files encompass online redo logs and offline redo logs, also
known as archive logs. Because archive logs are simply copies of redo logs, the two are identical except for the data that they contain.
Every Oracle database instance must contain at least two online redo logs. The existence of archive logs depends on the mode in which the database is runningARCHIVELOG or NOARCHIVELOG. Even in ARCHIVELOG mode, archive logs do not have to exist for
the database to function.
Each type of file that makes up an Oracle database instance has a different requirement for its individual backup. Some files can be backed up simply by using a standard operating system utility, such as the UNIX cp or cpio command. Others require you
to interface directly with the database to carry out the backup. Still others require a sophisticated blend of database and operating system interface.
As important as knowing what to backup is knowing how to back up correctly. If a backup is not correctly made, a database recovery might not be possible.
Control files resemble flat files, which means that compared to database files, they are easy to copy. Control files tend to be small, and effort needed to make copies is negligible. Control file backups are critical to being able to restore a database.
Without a control file, a database cannot be brought online.
One option for making copies of database control files is to use operating system utilities. This work well, but it is not an adequate or reliable backup if the database has not first been shutdown. Because an active databaseespecially one with a
high transaction volumeconstantly updates the control file, there is the risk that the copy might not reflect the true state of the database. Some people argue that this risk is small, but it still exists.
Copying a control file at the operating system level should be done only in conjunction with a cold backup.
A recommended method for backing up an Oracle database control file is to mirror the control files. Unlike true disk mirroring, which is implemented at the operating system level by using technology such as RAID 0/1, this method merely designates
multiple copies of the control file to which the database will write. This is done by means of the CONTROL_FILES parameter in the INIT.ORA parameter file:
You can include additional control files by editing the INIT.ORA file, shutting down, and restarting the database instance. For example,
control_files = (/u03/oradata/norm/control.ctl, /u05/oradata/norm/control.ctl, /u07/oradata/norm/control.ctl)
When you mirror the control files, place each control file on a separate physical disk to prevent all the copies from being lost in the event of a disk failurethe concept behind mirroring. Although an Oracle instance can function with only one
control file, Oracle recommends at least two. However, you can have as many control files as there are physical disk drives. Control files are extremely cheap in terms of storage and performance cost (writing the information to the control file).
Although mirroring control files is useful, it is still a good idea to make actual backups of control files. In a worst case disaster scenario, all the control files could be lost and the database might need to be restored on an alternate hardware
system. Mirroring is not of much use then.
Because a reliable backup of a control file is not available from the operating system, Oracle provides a method that makes a reliable backup copy of the control file while the database is running. Use the alter database command from Oracle
Server*Manager or SQL*Plus. For example,
alter database backup controlfile to '/u10/admin/norm/arch/bk_control.ctl';
This command makes a backup control file and places it in the destination directory under the filename specified in the command. In this case, the directory is the path /u10/admin/norm/arch, and the name of the backup control file is bk_control.ctl.
Here is a full script that makes backup copies of control files:
#!/bin/ksh $ORACLE_HOME/bin/svrmgrl << EOF connect internal alter database backup controlfile to '/u10/admin/norm/arch/control.ctl'; exit EOF
You can back up this control file to tape along with the other files. That way, you make a full recovery possible.
Suppose that during routine maintenance to extend a tablespace, you encounter an error indicating that the maximum number of database files has been reached. A low default value had been used when the instance was created. The only way to change this
value is to recreate the control file. To do this, however, you must know all the data files, redo log files, and so on, that make up the instance.
With a fair amount of research, any DBA could cobble together the necessary information to perform this task. For most DBAs, though, time is a premium resource; the easier an activity can be done, the better. Oracle provides a facility that enables you
to back up a control file to trace by issuing an alter database command from Oracle Server*Manager or SQL*Plus:
alter database backup controlfile to trace;
When you invoke this command, you create a SQL script that is capable of recreating the current control file of the Oracle instance. The destination of the script is the directory specified in the USER_DUMP_DEST parameter of the INIT.ORA parameter file.
For example,
#!/bin/ksh $ORACLE_HOME/bin/svrmgrl << EOF connect internal alter database backup controlfile to trace; exit EOF
When you execute this command, the editable SQL script shown here is produced. This script can be quickly changed and used, which ensures that all the parameters are correct. The DBA needs to worry only about scheduling, not the daunting task of
checking and double-checking parameters.
Dump file /u01/app/oracle/admin/norm/udmp/ora_25132.trc Oracle7 Server Release 7.1.6.2.0 - Production Release With the distributed option PL/SQL Release 2.1.6.2.0 - Production ORACLE_HOME = /u07/app/oracle/product/7.1.6 ORACLE_SID = norm Oracle process number: 9 Unix process id: 25132 System name: HP-UX Node name: testdev Release: A.09.00 Version: U Machine: 8999/867 Sat Sep 9 14:26:39 1995 Sat Sep 9 14:26:39 1995 *** SESSION ID:(6.21) # The following commands will create a new control file and use it # to open the database. # No data other than log history will be lost. Additional logs may # be required for media recovery of offline data files. Use this # only if the current version of all online logs are available. STARTUP NOMOUNT CREATE CONTROLFILE REUSE DATABASE ÒNORMÓ NORESETLOGS NOARCHIVELOG MAXLOGFILES 64 MAXLOGMEMBERS 5 MAXDATAFILES 1022 MAXINSTANCES 10 MAXLOGHISTORY 100 LOGFILE GROUP 1 ( '/u07/oradata/norm/redo101.log', '/u09/oradata/norm/redo102.log' ) SIZE 10M, GROUP 2 ( '/u07/oradata/norm/redo201.log', '/u09/oradata/norm/redo202.log' ) SIZE 10M DATAFILE '/u02/oradata/norm/system01.dbf' SIZE 80M, '/u04/oradata/norm/rbs01.dbf' SIZE 300M, '/u11/oradata/norm/temp01.dbf' SIZE 150M, '/u08/oradata/norm/tools01.dbf' SIZE 20M, '/u05/oradata/norm/users01.dbf' SIZE 10M ; # Recovery is required if any of the datafiles are restored backups, # or if the last shutdown was not normal or immediate. RECOVER DATABASE # Database can now be opened normally. ALTER DATABASE OPEN;
You should periodically make a control file backup to trace, perhaps as often as you make a control file backup.
Although Oracle provides a concise method for making backups of control files, no such method exists for redo logs. The only option for making backups of the redo logs is the very one that you have been cautioned againstusing operating system copy
utilities.
Assuming that four redo logs are located on a single disk, the following code segment demonstrates various techniques for making copies:
cp /u01/oradata/norm/redo101.log /DB1/oradata/norm cp /u01/oradata/norm/redo* /DB1/oradata/norm cpio -ocvB /u01/oradata/norm/redo101* > /dev/rmt/0hc cpio -ocvB /u01/oradata/norm/redo* > /dev/rmt/0hc find /u*/oradata/norm/*.log | cpio -ocvB > /dev/rmt/0hc
Some DBAs like to force the redo logs to perform a logfile switchusually forcing an archive log to be writtenbefore initiating a copy of the redo logs. This is done from Oracle Server*Manager or SQL*Plus by using the alter system command.
For example,
alter system switch logfile;
In an attempt to preserve the data within the online redo logs in much the same way as control files, Oracle V7 introduces redo log groups. They enable redo logs to be mirrored across multiple disks (see Figure 14.6).
Figure 14.6. Mirrored online redo logs.
Instead of having individual redo logs, each of which contains a distinct series of transactions, the redo logs are broken down into groups and members. Each group is a set of one or more redo logs that contain the same transactions. Each
member is a single redo log file within the group. The Oracle RDBMS treats all the redo logs as groups even if they contain only a single member.
You can think of redo log groups as a single redo log. There are at least two redo log groups in a database instance. Each group contains multiple membersthe redo log filewhich should be located on separate physical disks to benefit from the
mirroring. As the Oracle RDBMS writes information to the redo log, it writes the information into each member of the group. In this way, if a single member is damaged, at least one of the other members will enable the database to continue to function. This
greatly reduces the number of database failures caused by problems with redo logs.
The initial online redo log groups are created at the time of database creation. However, any number of factors can come into play that might prompt the DBA to want to add additional logfile groups. Adding a group of redo log members to the database is
done by issuing the alter database command from within Oracle Server*Manager or SQL*Plus. For example,
alter database add logfile group 4 ('/u02/oradata/norm/redo4a.log', '/u03/oradata/norm/redo4b.log') size 512K;
This command causes the database to create a new logfile group (group 4), which assumes that groups 1, 2, and 3 already exist and that group 4 does not. The two members of this group are 512K files named redo4a.log and redo4b.log, which are located in
/u02/oradata/norm and /u03/oradata/norm.
A DBA who is not familiar with Oracle7 might have created online redo logs without mirroring the members. It would benefit him to mirror the redo logs by adding additional members to existing redo log groups. The command syntax for this is
alter database add logfile member '/u04/oradata/norm/redo2b.log' to group 2;
This causes a logfile member named redo2b.log to be placed in the /u04/oradata/norm directory path and to be annotated as a member of redo log group 2. The new logfile member has the same size as the existing logfile members in the group to which it is
added.
A DBA might experience the situation in which a disk drive needs to be removed because it keeps encountering errors. Perhaps a new disk drive has been added to the system, and some of the redo log members need to be placed on the drive. Whatever the
case, redo log members can be quickly renamed or moved by using alter database from within Oracle Server*Manager or SQL*Plus.
To move the file physically from one location to another, specify the name of the file to move and its destination. In the following example, redo1a.log is moved from its present location, /u03/oradata/norm, to its new location, /u06/oradata/norm:
alter database rename file '/u03/oradata/norm/redo1a.log' to '/u06/oradata/norm/redo1a.log';
The same syntax is used to rename a file. Instead of specifying a new path, you specify a new filename. In the following example, redo2a.log in /u03/oradata/norm is renamed to redo3a.log. The directory path is the same.
alter database rename file '/u03/oradata/norm/redo2a.log' to '/u03/oradata/norm/redo3a.log';
To remove an entire group of redo log members, simply remove the group itself. After you remove the group, all the corresponding redo log members are dropped. To do this, issue the alter database command from within Oracle Server*Manager or SQL*Plus.
The following code removes all the redo log members for redo log group 5, but it does not affect any other groups:
alter database drop logfile group 5;
If redo log files are corrupted or if you must conserve disk space, you might have to remove redo log members from a redo log group. To do this, issue the following alter database command from within Oracle Server*Manager or SQL*Plus:
alter database drop logfile member '/u07/oradata/norm/redo6a.log';
This command removes the specified redo log member from its associated redo log group. It does not affect the group or any other redo members.
One of the simplest backup methods, but also one of the most difficult to implement, is the cold backup. In a cold backup, the database has been totally shut down and all the physical files associated with the database are copied by means of normal
operating system utilities. Because the database is not in operation, changes are not made to the physical files, and there are no conflicting integrity or consistency problems.
The difficulties in implementing this type of backup are mostly political, due largely to the amount of time required. Depending on the size of the database and the speed of the copy utility usedcopies to disk are faster than copies to tapea
cold backup can take anywhere from a few minutes to several hours or even several days. Thus, a cold backup is not always an option.
Many sites supplement weekly or monthly cold backups with other backups on a nightly basis. They think that they have 24/7 operations, when in reality large windows of time are available in the evening for a cold backup. This is, of course,
site-specific. It is up to the DBA to evaluate the needs of the user community versus the needs of the operations staff.
To accomplish a cold backup, you must first shut down the database instance. There are three shutdown methods for an Oracle database: normal, immediate, and abort. Here is a sample shutdown:
% svrmgrl SVRMGR> connect internal Connected. SVRMGR> shutdown Database closed. Database dismounted. ORACLE instance shut down.
When you issue a shutdown, also called a shutdown normal, from within Oracle Server*Manager, the Oracle RDBMS is very patient. In this mode, the database instance ignores further attempts to log into the database and waits for all the currently active
sessions to disconnect from the database.
Using shutdown normal is not always the best option for a backup, even though it provides for the cleanest type of shutdown. If a user leaves the office with his workstation still logged in, the backup must wait until he logs out. The next morning, no
one can log into the database because Oracle is still waiting to perform a shutdown, and the backup has not yet occurred.
A shutdown in the immediate mode is almost identical to a shutdown in the normal mode, with one exception: Oracle has no patience. When you issue a shutdown immediate command, Oracle immediately terminates all the database connections and performs
rollback operations on all the outstanding transactions. Checkpoints and buffer flushing are done, and the database is brought down.
For backup operations, shutdown immediate works best, for it deals with users who fail to log off their workstations. Because Oracle performs all the rollback and checkpointing, the database is in a consistent, stable state when the termination occurs.
The shutdown abort command should be used only as a last resort, and then only when all the other shutdown options have failed. By using the shutdown abort command, the DBA immediately terminates all the background processes that make up the Oracle
database instance, but no rollback, checkpoint, or buffer flushing operations occur before the shutdown. In rare cases, this can lead to corruption of some of the data within the database.
A shutdown abort should not be used to stop the database before backup operations. If shutdown abort is required, the database should be restarted and shut down again in either immediate or normal mode to ensure a stable, consistent view of the database
that is acceptable for backup operations.
A cold backup is the simplest of all backup operations. The steps required are
Whereas a cold backup takes a backup of a database in a shutdown state, a hot backup enables you to take a backup of a database that has not been shut down. This is the most tedious backup method, but it is also the most flexible. It enables you to take
backups of an active database. It ensures resource availability to end users and enables the DBA and the operations staff to recover the database.
Cold backups concentrate on copying all the physical files associated with a database instance. Hot backups, on the other hand, concentrate on the tablespace level. To do a hot backup, you must place every individual tablespace into a backup mode (by
using the alter tablespace command), copy the physical database files that make up the tablespace, and take the tablespace out of backup mode (by using the alter tablespace command). You can issue these commands from Oracle Server*Manager or SQL*Plus. For
example,
alter tablespace system begin backup; alter tablespace system end backup;
When you place a tablespace in backup mode, the Oracle instance notes that a backup is being performed and internally compensates for it. As you know, it is impossible to make an authentic copy of a database file that is being written to. On receipt of
the command to begin the backup, however, Oracle ceases to make direct changes to the database file. It uses a complex combination of rollback segments, buffers, redo logs, and archive logs to store the data until the end backup command is received and the
database files are brought back in sync.
Simplifying a hot backup in this way is tantamount to classifying the USS Nimitz as a boat. The complexity of the actions taken by the Oracle RDBMS under a hot backup could consume an entire chapter and is beyond the scope of this guide. What you
should understand is the trade-off for taking a hot backup is increased use of rollback segments, redo logs, archive logs, and internal buffer areas within the SGA.
When you run a hot backup, you can restore the data files that compose a single tablespace and apply all the associated redo and archive logs to bring the tablespace back in sync with the database.
A hot backup is a complex operation, because each tablespace involves a complete backup operation. It makes sense to break the hot backup into its component parts. The steps in a hot backup are
Cold and hot backups are not the only options available to the DBA. Other backup methods exist, but they often are unreliable and do not permit the level of recoverability that is available from cold and hot backups. These alternative backup methods are
useful as supplemental backups within a backup strategy. They are not designed to replace cold and hot backups.
Of the supplemental backup methods, the tablespace offline copy method is the only one that can feasibly be used in production. It is something of a hybrid between a cold backup and a hot backup. It enables you to do essentially a cold backup of a
tablespace while the database remains online.
In the tablespace offline copy method, each individual tablespace is taken offline by an alter tablespace command issued from Oracle Server*Manager or SQL*Plus:
alter tablespace users offline; alter tablespace users online;
The steps in a tablespace offline copy backup are
This method permits a complete hot-style backup of the individual tablespaces but without the additional activity within the redo logs, rollback segments, and so on. By using this method, you can take a backup of a tablespace and use archive logs to
recover any transactions that occurred after the backup.
The tablespace offline copy backup has some drawbacks. Database objects on the tablespace are unavailable while it is being copied. This is in direct contrast to the hot backup, in which a tablespace and its objects remain online and accessible.
Likewise, you cannot back up the SYSTEM tablespace with this method because it cannot be taken offline. You must use another backup method to do that.
A popular method for supplemental database backup is the exp utility, which performs exports of data within the Oracle database. The exp utility can perform three types of data exports:
There are a number of options available with the exp utility. They are described in Chapter 9, ÒImport/Export.Ó
If an export is used for recovery, everything in the database that has been added, deleted, or otherwise changed since the last export is lost. For this reason, exports are used only to facilitate quick, point-in-time recoveries, such as when a static
reference table is truncated. They do not provide the level of recovery that most mission-critical operations require.
Here is a partial screen listing for an export:
% exp file=/tmp/exp.log full=y Export: Release 7.1.6.2.0 - Production on Mon Sep 11 03:29:09 1995 Oracle Corporation 1979, 1994. . Username: system Password: ....... Connected to: Oracle7 Server Release 7.1.6.2.0 - Production Release With the distributed option PL/SQL Release 2.1.6.2.0 - Production About to export the entire database ... . exporting tablespace defintions . exporting profiles . exporting user definitions . exporting roles . exporting resource costs . exporting rollback segment definitions . exporting database links . exporting sequence numbers
To reclaim the exported data, use the Oracle imp utility, which imports the data. Both imp and exp are found in the $ORACLE_HOME/bin directory. For a more detailed discussion of exports, refer to the Oracle7 Server Utilities User's Guide.
Another strategy for backups is a result of the Oracle7 direct-load path in SQL*Loader. By using SQL scripts, PL/SQL programs, or 3GL interface programs, you can create a file for each database table in which each row is in a SQL*Loader-readable format,
such as comma-delimited. With this method, you can recreate and repopulate tables quickly after a failure.
The time required to administer this method neutralizes whatever benefit you might gain by it. Whenever a change is made to a database table, the change must be reflected in the appropriate program or else it is not correct. Likewise, this backup method
has many of the same limitations as exp without any of its simplicity or benefits. Because of the time and physical disk storage space required, this type of backup is a heavy undertaking. It must be policed almost constantly.
For some sites, however, this type of backup is practical despite the obvious constraints. For a more detailed discussion of the implementation of SQL*Loader, refer to the Oracle7 Server Utilities User's Guide.
Every DBA experiences a database failure at some point. It might be a minor failure in which the users never even know that they lost service, or it might be a severe loss that lasts for several days. Most failures fall somewhere in the middle.
Most failures result primarily from the loss or corruption of a physical data file. Of course, many other factors can cause database problems. Indeed, problems can occur in the memory structures (the SGA), the system hardware, or even the Oracle
software that prevent the DBA from starting up the database. The following sections describe the most common types of failures.
If a loss or corruption takes place in any of the database files that make up a tablespace, media recovery is required. The extent of the media recovery needed depends largely on the extent of the data file loss or corruption. The three types of
recovery available for this type of recovery are
The database recovery method is generally chosen if the SYSTEM tablespace has been damaged, in which case it syncs all the data files within the database during the recovery procedure. The tablespace recovery method is used if recovery is needed for
multiple tablespaces that had become damaged, such as from the loss of a disk drive. The data file recovery method is performed if only a single database file has been damaged. The commands used to implement these methods are
recover database; recover tablespace users; recover datafile '/u03/oradata/norm/rbs01.dbf';
Whenever a database loses a control file, there is generally little impact on the database itself as long as the DBA has mirrored the control files. To recover the control file, follow these steps (which assume that the control file has been mirrored):
The database will bring the control file in sync with the database, and the users will experience no loss of service or downtime.
If a control file has been lost and there is no backup, Oracle continues to run until it attempts to access the control file. At that point, the Oracle instance aborts. Two options available to the DBA:
To create a control file, you must first create a SQL script that will adequately recreate the existing control file. If a backup to trace is part of regular backups, the script already exists in the USER_DUMP_DEST directory. Use ls -lt in UNIX to find
the most recent one. Use view to make sure that it creates a control file and is not simply SQL*Trace output). Perform the following steps:
If you choose to use a backup control file, issue the following recover command in place of the standard recover command:
recover database using backup controlfile;
As with control files, there are two possible scenarios: loss of mirrored redo logs and loss of nonmirrored redo logs. If at least one member in each redo log group is usable and not corrupted, the database continues to function normally. You should
determine what caused the failure or corruption of the redo log member. Then you should rectify the problem by dropping and recreating the log member.
If all the members of a redo log group became corrupted or were lost, the scenario is entirely different. Dealing with the loss of an entire redo log group is the same as dealing with an unmirrored redo log. The two possibilities are
If the redo log group was not the active group, the corrupt group and its members eventually cause the database to shut down. The trick is to recognize that damage has been done and to react before the database shuts down. Restore the online redo log
from tape, or copy it from an existing redo log group if they are the same size. If the disk itself is corrupt and unavailable, rename the redo log group. If you are lucky enough to catch the database at this time, this is the best alternative. Otherwise,
if the database attempts to access the corrupted redo log, the redo log must be recovered as if the active redo log was lost (see below).
The more likely scenario is that the database aborted because it lost an inactive online redo log. The recovery steps are basically the same, but they are done in an offline fashion. Recover the offending redo log group, or make a copy of an existing
group if they are the same physical size. From Oracle Server*Manager, do connect internal and start up the database. The downtime involved should be minimal.
A loss of the current online redo log requires a limited recovery scenario. Although a full database recovery is not actually applied, you must to make the database think that one has occurred. Only then can processing continue. The steps are
Operations that require restarting an aborted Oracle database instance can be quite complex. The complications that can arise during an operation as sensitive as a recovery are numerous. If the recovery process does not seem to work properly, stop and
contact Oracle technical support immediately.
You have been forced to tinker with startups, shutdowns, and renaming and recovering physical database files. At least losing archive logs does not affect the continued operations of the database.
Well, almost.
Unlike losing a database file, a control file, or a redo logwhich ultimately causes an Oracle database instance to abortlosing an archive log has no visible effect on the database. After all, the logs are retained offline and are accessed
only when they are created as archives of the online redo logs and when they are used for database recovery.
Even though the loss of an archive log does not affect the continued operations of the databasewhich why NOARCHIVELOG mode is available if anything occurs that requires database recovery before the next backup, it will be impossible to
recover the database.
Because archive logs facilitate recovery, their loss is often realized only when it is too late. It is a difficult position for a DBA, and there is no clear right or wrong solution. It depends on the backup schedule. It is easier to wait a few hours
until the next hot backup than to wait several days for the next cold backup.
We recommend that you immediately initiate a hot backup of the database. It will slow down things and cause the system to choke a little on processing, especially during peak usage time. It is far better, though, than waiting and hoping that nothing
will happen.
There are several methods for performing database recovery. Each methods offers a trade-off between speed and simplicity. The following sections describe the major types of recovery available through the Oracle RDBMS.
In a cold restore, all the database files, control files, and redo logs are restored from tape or disk, and the database is restarted. It is the simplest, most complete recovery operation to perform. The primary drawback is that anything done to the
database since the last backup is lost.
The steps in a cold restore are
In a full database recovery, also called a complete recovery, data changed since the last backup can be restored. One or more database files are restored from backup. Archive logs are then applied to them until they are in sync with the rest of
the database.
The steps in a full database recovery are
Sometimes a recovery is required, but not everything in the archive logs is necessary. Suppose, for example, that an overzealous developer deploys a job that deletes every other row in a transaction processing table. In this case, a full recovery will
not work. Because the transactions that corrupted the table are in the archive logs, a full recovery simply restores from the last backup and processes all the transactions, including the haphazard delete. If you know that the job ran at 2:30 p.m., you can
use time-based recovery to recover until 2:29 p.m. That way, the table is exactly as it appeared before the job ran. This is also called an incomplete recovery.
A time-based recover is performed exactly like a full recovery, with the exception of the recover database command. The steps are
Even if you do not know the exact time when an error occurred, you might feel reasonably certain that you can isolate when to terminate the recovery based on the thread/sequence number. Perhaps there was a break in the archive logs because you had the
database out of ARCHIVELOG mode for a short time, or perhaps you want more control over what archive logs are applied as part of the recovery. The solution is cancel-based recovery.
Under cancel-based recovery, you are prompted after each archive log is applied. The recovery process continues until either the recovery is complete or you enter cancel at the prompt. The prompt appears within Oracle Server*Manager as
Specify log: [<RET> for suggested | AUTO | FROM logsource | CANCEL]
Once you enter cancel at the prompt, the recovery stops.
The steps in a cancel-based recovery are
The code examples in the following sections show you how to set up and execute hot and cold backup schemes. These are not highly intensive processing modules. There are certainly ways to make them more sophisticated. For example, you could make the
Oracle data dictionary determine which files to backup. Figure 14.7 shows the sample database that scripts try to backup.
Figure 14.7. Sample Oracle database layout.
This cold backup script issues a shutdown immediate command to terminate database operations. It then performs a mass copy of all the database files from the operating system to tape. When it is finished, it restarts the Oracle database instance.
#!/bin/sh # Oracle RDBMS Cold Backup # shutdown the database $ORACLE_HOME/bin/svrmgrl << EOF connect internal shutdown immediate exit EOF # make copies of all database files, control files, and redo logs find /u*/oradata/norm/*.dbf /u*/oradata/norm/*.ctl /u*/oradata/norm/*.log | cpio -ocvB > /dev/rmt/0hc # startup the database $ORACLE_HOME/bin/svrmgrl << EOF connect internal startup exit EOF
This hot backup script shows a backup that occurs to disk instead of to tape, as in the cold backup. Whereas the cold backup shuts down the database and does a mass file copy, the hot backup tediously copies the database files for each tablespace. The
cold backup is more dynamic than the hot backup because it uses wildcards and the OFA. Whenever a new database file is added or changed, the hot backup must be changed. If it is not changed, an adequate backup is not be taken.
Unlike the cold backup script, which makes its copies to tape, the hot backup script makes copies of the Oracle files to disk. Either type of copy is acceptable for either backup method.
#!/bin/sh # Oracle Hot Backup Script $ORACLE_HOME/bin/svrmgrl << EOF connect internal REM ** Backup System Tablespace alter tablespace system begin backup; !cp /u01/oradata/norm/system01.dbf /b01/oradata/norm/system01.dbf !cp /u03/oratata/norm/system02.dbf /b03/oradata/norm/system02.dbf alter tablespace system end backup; REM ** Backup TOOLS tablespace alter tablespace tools begin backup; !cp /u01/oradata/norm/tools01.dbf /b01/oradata/norm/tools01.dbf alter tablespace tools end backup; REM ** Backup RBS Tablespace alter tablespace rbs begin backup; !cp /u01/oradata/norm/rbs01.dbf /b01/oradata/norm/rbs01.dbf alter tablespace rbs end backup; REM ** Backup USERS Tablespace alter tablespace users begin backup; !cp /u01/oradata/norm/users01.dbf /b01/oradata/norm/users01.dbf alter tablespace users end backup; REM ** Backup PROD Tablespace alter tablespace prod begin backup; !cp /u03/oradata/norm/prod01.dbf /b03/oradata/norm/prod01.dbf !cp /u05/oradata/norm/prod02.dbf /b03/oradata/norm/prod02.dbf alter tablespace prod end backup; REM ** Perform Control file backup alter database backup controlfile to '/b01/oradata/norm/control.ctl'; alter database backup controlfile to trace; REM ** Backup OnLine Redo Logs !cp /u03/oradata/norm/redo*.log /b03/oradata/norm !cp /u05/oradata/norm/redo*.log /b05/oradata/norm exit EOF
This chapter discusses the resources available to an Oracle RDBMS to ensure database integrity and consistency. You learned how to implement them in a real-world backup strategy for mission-critical systems.
The information in this chapter barely scratches the surface of the backup and recovery functionality of the Oracle RDBMS. Keep in mind that an Oracle RDBMS is basically a collection of physical database files. Backup and recovery problems are most
likely to occur at this level. Three types of files must be backed up: database files, control files, and online redo log files. If you omit any of these files, you have not made a successful backup of the database.
Cold backups shut down the database. Hot backups take backups while the database is functioning. There are also supplemental backup methods, such as exports. Each type of backup has its advantages and disadvantages. The major types of instance recovery
are cold restore, full database recovery, time-based recovery, and cancel-based recovery.
This chapter also contains sample scripts that you can use to build your own backup scripts.