Troubleshooting duplication and restore performance with MSDP in NetBackup 7.6 and 2.6 appliances

Large restores and non-optimized duplication jobs (tape for example) from a media server deduplication pool (MSDP) perfom at a slower rate than from other non-deduplicated storage units. In some instances these jobs may perfom unacceptably slow and can cause:

  • Increased tape drive wear for duplications written to tape
  • Large SLP backlogs
  • Large number of queued jobs in the Activity Monitor
  • Missed SLAs

With deduplication, image files are broken down into segment objects (SOs) which are then stored within containers. As similar segments are sent to the storage pool, they are referenced inside of a database with existing segments rather than written to storage. This philosphy is also known as single instance storage. Because of the nature of the deduplication process, read operations such as restores and duplications require the reassembling of these segment objects (SOs) within images to bring the data to its pre-deduplicated state. There are several factors in investigating and improving the performance of these operations which are discussed in this document.

1. Host resources

The first area to consider is the hardware configuration of the MSDP storage server. The recommended minimum specifications are listed below.

a. Memory: at least 1 GB per TB of storage + 4 GB for OS + 4 GB for NBU

b. Storage IO: at least 130 MB/sec reads/writes, 250 MB/sec or more for enterprise class performance. It is not recommended to use iSCSI or thin provisioned storage (see TECH214907)

c. Processor: at least four cores at 2.2 GHz

d. Operating system: 64 bit operating system

If the storage server is to accept data from several clients or other storage servers, more hardware resources may be needed. Deduplication by nature is a very CPU, memory, and IO intensive process, so an optimial build is essential for both performance and reliability. If using a Windows based storage server, antivirus (AV) needs to be excluded from the MSDP file system to prevent quarantining of database containers which often results in data loss. Other appliacations which scan or alter file systems must be modified to not change the MSDP data and database file systems.

Note: for customers considering Microsoft Windows operating system for their MSDP media server, please see ‘HOWTO: Special considerations when deploying NetBackup Media Server Deduplication on Windows systems’ HOWTO61249.

2. Roles of the storage server

The best performing and most reliable configurations have MSDP built on a dedicated media server. Adding roles such as Master Server, OpsCenter, VMWare backup host, Fiber Transport Media Server (FTMS), or others such as Active Directory, File Services, SQL Server, etc. will need to be planned for as the requirements in part 1 above are written for the MSDP role.

3. Storage performance

To assess performance of the storage partition, stop all NetBackup services on the storage server host and others that could be using resources on storage. In Unix, the lsof command can be used to verify that there are no open files on storage. This will ensure that the IO tests are most accurate as the test will not be competing with other processes. It is recommended to use the nbperfhck utility to obtain average statistics for read and write performance. For the most accurate results, use a file size larger than the memory installed in the storage server host or if using an external storage array, higher than the array cache. See HOWTO72940 for more information on this utility. Sample syntax is shown below to show how statistics are returned.

Write performance test:

# nbperfchk -i zero: -o /disk/nbperfchk.tmp -s 150g
…snip…
153600 MB @ 777.6 MB/sec

Read performance test:

# nbperfchk -i /disk/nbperfchk.tmp -o /dev/null
…snip…
153600 MB @ 570.9 MB/sec 

In this example, sequential read/writes perform at 571 and 778 MB/sec. This is important to establish as it is the maximum that MSDP will be able to rehydrate an image not including CPU, memory, and other overheads. If the MSDP database and storage partitions are on different LUNs, then the IO tests should be performed on both.

4. Intended load

How many clients, daily data volume, policy type, concurrent backups, duplication jobs, and replication jobs will be writing to the storage pool also needs to be considered. In medium and large environments, SLP operations should be tuned to keep duplication and replication streams (IO reads) separate from backup streams (IO writes). This makes the most effecient use of caching on the RAID controller and file system. NetBackup 7.6 has the capability to control SLP processing using start windows to help separate read and write IO traffic. More information on this can be found in TECH211111.

5. Nature of the data

During backups, new and unique data segments are stored within the data containers. With the initial backup, the data segments will all be in close proximity to each other as it was written sequentially. After the first backup, only unique data segments are stored at the end of the containers. This in turn causes fragmentation and poor segment locality. Given that client data changes over time, images are less likely to have common data segments with the previous backups. All of this changed data is appended to the last container and new containers are created as needed in 7.6. The deduplication rate may be high but over time small changes add up, causing the data to be scattered across the containers and underlying file system. Client and policy combinations with a high change rate and/or low deduplication rates will be impacted more quickly with a loss of locality. For these reasons unstructured data types tend to be impacted more significantly but loss of locality will occur on all data types over time.

Upon configuration of a new storage/deduplication pool, it is recommended to run a full backup of each client’s base operating system: “/” for Unix/Linux and ‘C:\” for Windows systems. Doing this stores the core OS data together which is similar across the client platforms.

6. Rebasing

The process of creating better segment locality with the deduplication pool is referred to as rebasing. Over time the data for a specific policy and client combination will become be spread amongst many data containers and the disk itself. Server side rebasing works at the storage pool/server level to move segment objects between containers so they are closer together. This is similar in concept to degragmenting a file system as objects are stored more contiguously to improve read performance. The server side rebasing is a low priority process and therefore will not always occur due to load conditions. Because of this, busier storage pools might not rebase as often as they should. The process can also be limited by the number of segments which can be moved.

Client side rebasing looks at the locality of a client and policy combination when the backup begins and if a particular threshold is not met, client side rebasing occurs. Client side rebasing will resend these segments to the storage pool, creating a fresh copy to improve locality. This may cause a short term increase in storage usage but has not been found to have an impact on day-to-day operations. In NetBackup 7.6 and higher, client side rebasing is enabled by default. Client side rebasing can be enabled manually in previous versions by following the steps provided in HOWTO70652. Due to the nature of the Accelerator feature, client side rebasing cannot occur on backups with it enabled.

NOTE: Images created with the Accelerator enabled will expierience poorer locality due to client side rebasing not being unavailable and therefore experience poorer rehydration performance.

7. Segment locality

If it is suspected that SOs are not stored optimally or more evidence is needed to explain rehydration performance problems, a locality check can be performed. Please contact Symantec Technical Support for more information on this step and advanced troubleshooting.

8. MSDP configuration file tuning

Depending on the server hardware, operating system configuration and system load, changes can be made to the MSDP configuration files to improve performance further. Care should be taken as making many or large changes at once can compromise system stability. It should also be noted that changes intended to improve rehydration performance may negatively affect backup performance, particularly parameters related to rebasing as they can create additional overhead on the storage server. It is strongly recommended to make a backup copy of the configuration before editing it for rollback purposes, make small changes at once, and test with backups and restores in between changes.

a. Contentrouter.cfg

Unix: /$MSDPstorage/etc/puredisk/contentrouter.cfg

Windows: \$MSDPstorage\etc\puredisk\contentrouter.cfg

  • Change PrefetchThreadNum=1 to PrefetchThreadNum=8 to speed up prefetching. If results are satisfactory, change PrefetchThreadNum=8 to PrefetchThreadNum=16. This parameter specifies how many threads to use to preload segments during rehydration.
  • Verify MaxNumCaches=1024. This allows more containers to be opened simultaneously to avoid frequent opening and closing of the same containers.
  • Verify RebaseScatterThreshold=64MiB. This parameter specifies the average data size threshold per container for a given backup image to be considered for rebasing.
  • Change RebaseMaxPercentage=5 to RebaseMaxPercentage=50.
  • Change : RebaseMaxTime=150 to RebaseMaxTime=600.
  • Windows storage servers only: Change ReadBufferSize=65536 to ReadBufferSize=262144.

b. PD.conf

Unix: /usr/openv/lib/ost-plugins/pd.conf

Windows: \$InstallPath\bin\ost-plugins\pd.conf

Change “PREFETCH_SIZE = 33554432″ to “PREFETCH_SIZE = 67108864″. This is the buffer used for restore operations, in bytes.

c. If necessary, make changes to the below files on the NetBackup side. Below are the starting points that work best in most environments.

  • SIZE_DATA_BUFFERS = 262144
  • SIZE_DATA_BUFFERS_DISK = 1048576
  • NUMBER_DATA_BUFFERS = 256
  • NUMBER_DATA_BUFFERS_DISK = 512
  • NET_BUFFER_SZ = 1048576
  • NET_BUFFER_SZ_REST = 1048576

 

 

 

 

NetBackup 7.6新功能介绍

从我个人了解来看,NetBackup7.6主要增强功能有两个。

第一、Media Server Deduplication Pool(MSDP)功能得到极大增强,其中主要得益于PostgresSQL替换成refdb。从官方文档来看主要有以下几点

1、在MSDP负载极高情况下,提高备份作业成功率。

2、降低了维护窗口要求。

3、新版本MSDP对内存需求降低了不少,不再需要原来1:1(内存和磁盘容量)模式来分配内存。即64TB存储不再需要64GB内存。

4、新Deduplication Multi-Threaded提高media server 和 client 端重删效率。

第二、Storage Lifecycle Policy (SLP) 功能增强。

1、SLP 也可以像策略一样设置执行窗口了。这样就改变了原来由参数控制复制作业发起,调整为根据根据作业负载或者需求来发起复制作业。

2、挂起和恢复传统复制作业。主要是控制当slp窗口结束后,传统复制作业可以挂起。

3、SLP参数原来都通过配置文件来调整,现在都可以在管理界面调整。

4、旧版本很难了解复制作业进度,7.6可以直观了解复制作业进度。

 

 

NetBackup vmd 进程不能启动一例

平台:
windows 2003 、NetBackup 6.5.6 media server
现象:
系统起来,检查NetBackup进程发现vmd进程没有启动。同时发现inetd服务也不能启动。
故障排查过程:
1、重新启动NetBackup故障依旧。
2、尝试手动启动inetd服务,发现系统报用户名错误。
3、询问管理最近操作,原来最近有修改超级用户密码。
4、检查inetd服务,该服务由超级用户来启动的。调整服务启动密码。
5、重新启动NetBackup进程,vmd服务正常启动。

NetBackup nbdevquery 命令详解

nbdevquery 命令作用:
该命令主要是检查NetBackup 磁盘类型介质状态,比如openstorage、PureDisk、AdvancedDisk diskpool、storageserver状态。

命令选项:
[-listdp]
[-listdv]
[-liststs]
[-listmediaid]
[-listmounts]
[-listglobals]
[-listconfig]
[-listreptargets]

选项介绍:
-listdv 查看系统里所有disk pool信息。

[root@nbu1 staging]# nbdevquery -listdp
V7.5 dp_nbu1 1 7.55 7.55 1 98 80 -1 nbu1
V7.5 ad_nbu1 1 3.94 3.94 1 98 80 -1 nbu1
表示NetBackup 7.5版本下,有两个disk pool。详细信息可以加-U选项。
-listdv 查看系统disk pool状态,主要包括disk pool 是online or offline。

[root@nbu1 staging]# nbdevquery -listdv -stype PureDisk -U
Disk Pool Name : dp_nbu1
Disk Type : PureDisk
Disk Volume Name : PureDiskVolume
Disk Media ID : @aaaax
Total Capacity (GB) : 7.55
Free Space (GB) : 6.66
Use% : 11
Status : DOWN
Flag : ReadOnWrite
Flag : AdminUp
Flag : InternalDown
Num Read Mounts : 0
Num Write Mounts : 1
Cur Read Streams : 0
Cur Write Streams : 0
Num Repl Sources : 0
Num Repl Targets : 0
显示当前disk pool状态为down。

-liststs 列出当前所有storage server信息

[root@nbu1 staging]# nbdevquery -liststs
V7.5 nbu1 PureDisk 9
V7.5 nbu1 AdvancedDisk 5
显示当前系统下有两个storage server,且类型分别为PureDisk、AdvancedDisk。

-listmediaid 显示media id 下所有disk volume信息。

[root@nbu1 staging]# nbdevquery -listmediaid @aaaax
V7.5 dp_nbu1 PureDisk PureDiskVolume @aaaax 7.55 6.66 11 0 0 1 0 0 6

-listmounts 显示disk pool mount点

[root@nbu1 staging]# nbdevquery -listmediaid @aaaax
V7.5 dp_nbu1 PureDisk PureDiskVolume @aaaax 7.55 6.66 11 0 0 1 0 0 6
[root@nbu1 staging]# nbdevquery -listmounts
Disk Pool dp_nbu1 has 1 Mount Points
PureDiskVolume @ nbu1 (mounted)
Disk Pool ad_nbu1 has 1 Mount Points
/ad @ nbu1 (mounted)
每个disk pool 有一个mount点。
-listglobals 显示SCSI Persistent Reservation 属下。

[root@nbu1 staging]# nbdevquery -listglobals
SCSI Persistent Reservation: 0

-listconfig 显示storage server配置信息。

[root@nbu1 staging]# nbdevquery -listconfig -stype PureDisk -storage_server nbu1
V7.5 “storagepath” “/dp” string
V7.5 “spalogpath” “/dp/log” string
V7.5 “dbpath” “/dp” string
V7.5 “required_interface” “nbu1″ string
V7.5 “spalogretention” “7” int
V7.5 “verboselevel” “3” int
V7.5 “replication_target(s)” “none” string
V7.5 “Storage Pool Raw Size” “7.9GB” string
V7.5 “Storage Pool Reserved Space” “322.5MB” string
V7.5 “Storage Pool Size” “7.6GB” string
V7.5 “Storage Pool Used Space” “908.1MB” string
V7.5 “Storage Pool Available Space” “6.7GB” string
V7.5 “Catalog Logical Size” “110Bytes” string
V7.5 “Catalog files Count” “2” string
V7.5 “Space Used Within Containers” “156Bytes” string
V7.5 “Deduplication Ratio” “0.7” string

-listreptargets 查看跨备份域复制信息。
nbdevquery -listreptargets -stunit <label> [-U]
#由于我的环境没有配置,所以暂时没输出示例。后续补上。

 

 

 

NetBackup 删除diskpool失败一例

现象:
通过图形界面删除puredisk pool,报如下错误。
failed to delete disk pool, invalid command parameter
从debug里显示如下错误:

22:33:17.555 [13642] <2> dsm_update_diskgroup_state: Calling dsm->updateDiskGroupState()
22:33:17.577 [13642] <16> dsm_update_diskgroup_state: DSM has encountered the following busy resource: dp_nbu1, mount point = PureDiskVolume
22:33:17.577 [13642] <16> dsm_update_diskgroup_state: ServiceException: method=updateDiskGroupState():7512 service=DiskService host=nbu1 errorDomain=DSM errorCode=2050027 errorText=dp_nbu1(PureDisk)@nbu1
22:33:17.577 [13642] <16> modify_disk_group_state: dsm_update_diskgroup_state call failed, bp_status = 20
22:33:17.577 [13642] <16> deletedg: failed to DOWN the disk pool (bp_status = 20), so can’t delete it, returning
22:33:17.577 [13642] <2> nbdevconfig: operation returned status = 20
22:33:17.577 [13642] <16> DevConfigCLI::analyzeOp: failed to delete disk pool, invalid command parameter
22:33:17.579 [13642] <2> nbdevconfig: Exiting, status = 20

 

排除步骤:
1、确保 相关STU已经删除。
2、确保在这个STU上的image都已经过期。
关于第二点可以使用以下方法:
a)使用catalog查看,查找所有的image都已经过期了。
b)nbstlutil list -U  #确保SLP里涉及的image都已经过期。
3、确保SLP里没有使用该diskpool的策略。

从我的系统里,我检查所有的信息都没有了。但是还是删除不掉。由于从catalog里找不到任何image信息,怀疑为image clean时出现异常。尝试手动delete 过期信息。
#nbdelete -allvolumes
执行了这条命令,好像还是不行。查找网上相关信息,需要加force选项。
#nbdelete -allvolumes -force
命令执行完成。尝试使用命令行删除设备信息,正常完成。

[root@nbu1 nbu1]# nbdevconfig -deletedp -dp dp_nbu1 -stype PureDisk
Disk pool dp_nbu1 has been deleted successfully

版权所有快备份
转载请标明www.keifen.com

STATUS CODE 5: Attempts to restore the SQL master database to a new server fail with a NetBackup Status Code 5 (the restore failed to recover the requested files).

Problem

STATUS CODE 5: Attempts to restore the SQL master database to a new server fail with a NetBackup Status Code 5 (the restore failed to recover the requested files).

Solution

Overview:  Attempts to restore the SQL master database to a new server fail with a NetBackup Status Code 5 (the restore failed to recover the requested files).

Troubleshooting: Enable the dbclient log file on the SQL server.

Log files:
The dbclient log file shows the following error message:
16:23:14.443 [2832.5432] <16> CODBCaccess::LogODBCerr: DBMS MSG - ODBC return code <-1>, SQL State <37000>, SQL Message <3168><[Microsoft][ODBC SQL Server Driver][SQL Server]The backup of the system database on device VNBU0-2832-5432-1179865295 cannot be restored because it was created by a different version of the server (134218488) than this server (134219767).>.

Resolution:
As detailed on the Microsoft website in knowledge base article 264474 (link below) it is not possible to restore a system database to a server with a different build level from the original source server.

http://support.microsoft.com/kb/264474

A comprehensive list of solutions for the most common NetBackup for Microsoft SQL Server database agent backup and restore issues

Problem

A comprehensive list of solutions for the most common NetBackup for Microsoft SQL Server database agent backup and restore issues

Solution

1. How to restore to an alternate client, same client, with a different DB name, with a move script, to a Cluster, SQL Transaction Logs Refer to the following TechNotes to address these issues:

2. How to perform a backup of SQL, SQL in a Cluster, SQL Transaction Logs, and perform a cold backup.  
Refer to the following TechNotes to resolve these issues:

3. Backup or restore failed with error 1, 2, 5, 236. 239, or 58; SQL DB in Loading state after restore due to typing Mistakes in the Backup or Restore Script.  
Refer to the following TechNotes for details and resolutions:
Other possible causes of backup or restore failures are as follows:
  • SQLHOST keyword in the script is pointing to the wrong host
  • Wrong master name in the script
  • BROWSE CLIENT = <virtual name> instead of the node name
  • SQLHOST specified in capital letters
  • Wrong DB name
  • Wrong SQLINSTANCE keyword in the script
  • BROWSE CLIENT in upper\lower case, must reflect the name in the Master if the Master is Unix
  • Database in loading state after restore: RECOVEREDSTATE was set to NOTRECOVERED
  • An ordinary restore script was used for an alternate client restore, the move script should be used instead
  • BROWSECLIENT keyword is missing
4. Backup or restore fails with error: 2,5,23,58,48, or backup can hang due to incorrect name resolution or Network issues.  Refer to the following for troubleshooting and most frequent causes:

Explanation of bpclntcmd command options, the system calls being used, and recommended troubleshooting when the commands return errors:  http://symantec.com/docs/TECH50198
Status Code 23 during client backups or restores, or when loading client properties:  http://symantec.com/docs/TECH57100
Use the bpclntcmd to troubleshoot the following problems:
  • Incorrect DNS settings
  • Incorrect reverse lookup
  • Missing or incorrect IP address in the host file of the Client, Media or Master server.
5. Backup or restore fails with error 1, 2, 25; restore fails with error 5, or Error “Exclusive access could not be obtained because the database is in use” due to 3rd party application problems
Refer to the following TechNotes to address these issues:
  • Getting error “Exclusive access could not be obtained because the database is in use” when attempting to restore database over different database using a move template:  http://symantec.com/docs/TECH59128
  • Attempts to restore a SQL database fail with a Status Code 5.  The NetBackup MS SQL Client “View Status” window shows the following message: “Exclusive access could not be obtained because the database is in use”:  http://symantec.com/docs/TECH44445
  • SQL 2000 or SQL 2005 user database restore fails with the error “Exclusive access could not be obtained because the database is in use” when single user mode is already set on the database that is being restored:  http://symantec.com/docs/TECH18466
  • “Exclusive access could not be obtained because the database is in use” when performing a Microsoft SQL 2000 or SQL 2005 restore:  http://symantec.com/docs/TECH16063
  • NetBackup for Microsoft SQL Server database backup exits with Status Code 6, and a status 995 is reported in the SQL Server errorlog:  http://symantec.com/docs/TECH5970
  • After adding a new client to an MS-SQL-Server policy, the new client fails with a Status Code 2 and, in the dbclient log file, the message “The requested name is valid, but no data of the requested type was found” is shown:  http://symantec.com/docs/TECH44647
Other possible causes for failure:
  • Wrong SA user account password specified in the SQL Agent properties
  • No disk space
6. Backup error 1, 2, 240, 199, Error 2: USER – Operation inhibited by NetBackup
for Microsoft SQL Server: Only a full backup can be performed on the master database due to incorrect policies configuration
Refer to the following TechNotes for details and resolutions to these errors:
7. Backup failed with error 2, 167, backup or restore error 25, and bplist command error 133 due to parameters that can be changed via a GUI
Refer to the following TechNotes for details and resolutions:

Most frequent causes:
Changed NetBackup (NBU) client service account to a working one, or one with the correct SQL rights
Selected “allow client browse” via Host Properties\Master
Added media server host name to client servers list
Corrected wrong master name in the client Registry
Corrected wrong client name in BAR GUI

During an alternate client restore, the error “ERROR Initializing NetBackup Catalog” occurs, launching the SQL Backup History Options GUI:  http://symantec.com/docs/TECH27039

8. Backup or restore error 41 or restore failed with error 5 due to needed tuning

Refer to the following TechNotes for details and resolutions:
  • With some SQL issues, increasing the Client Read Timeout on the SQL client up to 36000 seconds will help.
  • Performance tuning for NetBackup for Microsoft SQL Server backups:  http://symantec.com/docs/TECH33423
  • How to back up multiple Microsoft SQL Server databases in parallel using more than one tape drive: http://symantec.com/docs/TECH18392
  • Restores of large Microsoft SQL server databases using the NetBackup for Microsoft SQL Server database extension fail before jobs start reading data from tape:  http://symantec.com/docs/TECH14997
  • How to troubleshoot Microsoft SQL Server database restore issues:   http://symantec.com/docs/TECH39006
  • Is it possible for SQL databases backed up with more than one stripe to be restored using fewer stripes when using the NetBackup for Microsoft SQL Server database agent?  http://symantec.com/docs/TECH48409
  • Changes to the NetBackup for SQL Microsoft SQL Server database agent allow a multi-striped image to be restored with a single stripe:  http://symantec.com/docs/TECH49125
9. Restore failed with error 25 or 13 because the BAR GUI was used to launch the restore instead of SQL Agent GUI
Refer to the following TechNotes for details and resolutions:

Legacy ID

331936

Article URL http://www.symantec.com/docs/TECH74475

 

Terms of use for this information are found in Legal Notices  

Considerations when replacing libobk/orasbt when updating NetBackup for Oracle

Problem

Special coordination may be required to ensure that the NetBackup Client and NetBackup for Oracle libraries are properly updated when upgrading or applying a hotfix.

Solution

Overview:

Oracle RMAN uses the Serial Backup Tape (SBT) API to perform backup to tape devises.  The NetBackup Oracle extension is an implementation of the SBT API.

Upgrading the SBT API, can present some challenges for an application that runs 24 x 7.  The information below should be reviewed and well understood before planning the installation or upgrade of NetBackup on an Oracle host.

The nature of running processes is that, by default, external references are resolved and the relevant shared object libraries read from disk and mapped into the running process space only once during the life of a process.  Thus a process that runs continuously and performs a backup every day typically does not reload libraries before each backup.  Consequently, the only way to force the process to load an updated copy of a library is by stopping and restarting the process.  Hence the challenge to a 24 x 7 application.

Recommendations:

Follow these steps to perform a successful upgrade of NetBackup on an Oracle client host.  This applies to upgrading the NetBackup Oracle extension and the NetBackup Client whose libraries are used by the extension.  Prior to NetBackup 7.0, these are separately installed components and both should always be upgraded at the same time and to the same maintenance pack or release update level.  Starting with NetBackup 7.0, the NetBackup Client install automatically includes the NetBackup for Oracle extension.

Please note that all references to ‘sbt operations’ encompasses backup, restore, and catalog maintenance operations.

1) Stop all processes for the Oracle instances on the host.  Some may have the old libraries mapped into process space.  If there is more than one instance and all are using NetBackup, then all should be stopped.

2) Stop the Oracle listener process if sbt operations have been performed using TNS aliases since NetBackup was last installed or upgraded.  In that configuration, the Oracle listener spawns the process that will do the sbt operation and it too will likely have the old libraries mapped into process space.

3) On HP-UX, the files on disk are the backing store for the running process and may be locked, causing any attempt to overwrite the files to fail.  Check if the files are in use and terminate any processes that are using them prior to updating the libraries.

$ fuser /usr/openv/lib/libxbsa*
$ fuser /usr/openv/netbackup/bin/libobk*

4) On AIX, the old library may already be in the library cache.  New or existing processes will look in the library cache first and may not load the new libraries from disk when resolving external references.  If all the Oracle processes noted above have been halted, clear the cache.

$ /usr/sbin/slibclean

5) On Windows, locate all ‘*xbsa*.dll‘ and ‘orasbt.dll‘ files and delete them.  The install will reinstall the new copies in the appropriate places and the older ones will no longer be inadvertently found higher in the search PATH when resolving external references.

6) Perform the install or upgrade per the NetBackup software distribution instructions.

7) After the install, inspect the output from the following commands to confirm that the expected version of the files are installed.

$ cd /usr/openv
$ cat netbackup/bin/version
$ cat share/*oebu*
$ ls -1 lib/libxbsa* netbackup/bin/libobk* \
 | while read fn ; do
   netbackup/bin/goodies/support/versioninfo -f $fn
 done

Note that the versioninfo program has been included in the NetBackup server distribution since NetBackup 6.0, but was not added to the client distribution until the 6.5.4 release update.  It can be copied from a server of the same platform type as the client.

On Windows, locate the files and check their properties.

8) Following the install, ensure that Oracle is properly using the newly installed libraries by follow the steps in TECH72307 in the Related Articles section.

Final Notes:

Newer versions of Oracle (9i and above) should dynamically load and unload the SBT library as needed, but in rare instance reportedly do not.  The following recommendations have been found to be useful in the past.

A) Consistently use SBT_LIBRARY for all SBT operations.  This will cause an explicit dlopen system call to locate and read the library file when the channel is allocated.  Then when the channel is released, an explicit dlclose system call will unload the library from the process space so that it can be reloaded from disk, when the channel is allocated for the next backup or restore.  I.e.

ALLOCATE CHANNEL … TYPE SBT_TAPE PARMS=’SBT_LIBRARY=/usr/openv/netbackup/bin/<appropriate_libobk>';

On AIX, be aware that the old library will still be referenced by the library cache.  But if all sbt operations specified SBT_LIBRARY and are complete, the use count will be 0 so slibclean will remove it from the cache.

B) Avoid using a TNS alias to connect to the target database when the database is local to the host that is running RMAN.  Using an alias causes the Oracle listener to create the Oracle server process.  The listener may be running as a different user than the instance to backup or restore, which may have a different $ORACLE_HOME, which will cause a different path to be searched for libobk, which may cause an unexpected libobk to be loaded and used.  See the Related Articles for details regarding Oracle 11g.

How to confirm that Oracle is loading the correct NBU Oracle extension library files for use

Problem

How to confirm that Oracle is loading the correct NBU Oracle extension library files for use?

Solution

When the Oracle RMAN program performs a backup, restore, or catalog maintenance operation using the SBT API, it will utilize a libobk* shared object library or orasbt.dll.  This library is provided by third-party backup software, including the NetBackup (NBU) Oracle extension.  The NBU Oracle extension is dependent upon the xbsa library provided with the NBU Client.

Below is a process for confirming if the correct library files are installed and being utilized.  These examples are for Unix, but the process is the same for Windows and the details are at the bottom of this document.

1) Shutdown the Oracle instance.

2) If TNS aliases have or will be used by RMAN to connect to the instance(s) then also shutdown the listener.

3) Confirm all Oracle processes are down.

$ ps -ef | grep -i ora

4) On AIX, also clear the library cache.  This will only work if all Oracle process that utilize the libobk are down and the library use counter has decremented to 0.

$ /usr/sbin/slibclean

Note: Confirming that the Oracle process are down is significant!  Once the Oracle instance or listener starts and attempts an SBT API operation, it loads the then current library files from disk into memory.  The running process should unload the library when not needed and then load a new copy when needed, but in rare instances may not.  If that happens, the Oracle instance may remain ignorant of updated copies on disk and continue to use the older copy already loaded into process space.  See TECH72419 in the Related Articles section for additional details.

5) Capture the last access times on the library files defined by Oracle, NBU Oracle, and NBU Client.

(
ls -lu $ORACLE_HOME/lib*/libobk*
ls -lu /usr/openv/netbackup/bin/libobk*
ls -lu /usr/openv/lib/libxbsa*
) > /tmp/nbu-lib-access.before.out

6) Restart the Oracle instance.

7) Perform a backup, restore or catalog maintenance operation using RMAN.

8) Capture the last access times on the library files again.

(
ls -lu $ORACLE_HOME/lib*/libobk*
ls -lu /usr/openv/netbackup/bin/libobk*
ls -lu /usr/openv/lib/libxbsa*
) > /tmp/nbu-lib-access.after.out

9) Compare the output files from steps 5 and 8.

The access time on one of the NBU libobk.* files and one of the NBU libxbsa files should have been updated.  The access time on the libobk.* in one of the lib, lib32, or lib64 subdirectories below $ORACLE_HOME may also have been updated.

If the access times did not update on the expected library files then either the Oracle instance configuration or the RMAN PARMS statement in the backup/restore/maintenance script specifies an alternate location for SBT_LIBRARY or another libobk* file exists higher on the LD_LIBRARY_PATH (Solaris & Linux), SHLIB_PATH (HP-UX), or LIBPATH (AIX).  The DBA will be familiar with the Oracle library load search process and can make the adjustments so that it uses the correct file.

If the access times updated on an unexpected files, then the DBA will also need to correct the Oracle library load search process.  This may involve specifying SBT_LIBRARY, deleting older libraries that are higher on the search path, or symbolically linking the instance to the NBU appropriate libobk.  To build the correct symbolic links, use this script.

$ /usr/openv/netbackup/bin/oracle_link

If the access times did not update on any of the libobk files in the Oracle or NBU directories updated, then it is likely that RMAN is connecting to an Oracle instance running on another host instead of on this host.  The DBA should check if the target instance is being accessed via a TNS alias and if the alias is resolving correctly.

Note: Starting with NetBackup 6.5.6 (or any hotfix to support Oracle 11g R2), the NBU libobk is statically linked with the xbsa library and the update time on the xbsa library will not change during the operations above.
10) After the symbolic links have been corrected, repeat steps 1-9 to ensure the correct library file is in use.

11) If the correct files are being accessed, but the libraries still will not load, then check the NetBackup Database Compatibility matrix to ensure that the platform, architecture (32 or 64 bit), and version of Oracle that is in use is also supported by the version of NetBackup that is installed.

12) For the next few days or weeks, enable the dbclient debug log and monitor for these lines.  The build dates for the xbsa library and NetBackup for Oracle should be the same or only a few days different.  If they differ significantly, investigate if a mismatched older library is present and in use by some process.
$ cd /usr/openv/netbackup/logs/dbclient
$ egrep ‘NetBackup XBSA Interface|NetBackup for Oracle’ log.??????
08:30:33.508 [11019] <4> VxBSAInit: Veritas NetBackup XBSA Interface – 7.1  2011020313
Veritas NetBackup for Oracle – Release 7.1 (2011020313)
08:34:13.996 [12828] <4> VxBSAInit: Veritas NetBackup XBSA Interface – 7.1  2011020313
Veritas NetBackup for Oracle – Release 7.1 (2011020313)
09:04:48.405 [26337] <4> VxBSAInit: Veritas NetBackup XBSA Interface – 7.1  2011020313
Veritas NetBackup for Oracle – Release 7.1 (2011020313)

For comparison, these are the build dates for libxbsa and libobk for the NetBackup 6.5 and 7.x release.

Veritas NetBackup XBSA Interface – 6.5  2007111605 6.5.1
Veritas NetBackup for Oracle – Release 6.5 (2007111606)
    Veritas NetBackup XBSA Interface – 6.5  2008052300 6.5.2
Veritas NetBackup for Oracle – Release 6.5 (2008052301)
    Veritas NetBackup XBSA Interface – 6.5  2009120409 6.5.3
(No NB Oracle fixes in 6.5.3, should be using NB_ORA_6.5.2.)
    Veritas NetBackup XBSA Interface – 6.5  2009050105  6.5.4
Veritas NetBackup for Oracle – Release 6.5 (2009050106)
    Veritas NetBackup XBSA Interface – 6.5  2009110613  6.5.5
(No NB Oracle fixes in 6.5.5, should be using NB_ORA_6.5.4.)
    Veritas NetBackup XBSA Interface – 6.5  2010042404 6.5.6
Veritas NetBackup for Oracle – Release 6.5 (2010042405)
    Veritas NetBackup XBSA Interface – 7.0  2010010418 7.0
Veritas NetBackup for Oracle – Release 7.0 (2010010418)
    Veritas NetBackup XBSA Interface – 7.0  2010070723 7.0.1
Veritas NetBackup for Oracle – Release 7.0 (2010070723)
    Veritas NetBackup XBSA Interface – 7.1  2011020313 7.1
Veritas NetBackup for Oracle – Release 7.1 (2011020313)
    Veritas NetBackup XBSA Interface – 7.1  2011061213 7.1.0.1
Veritas NetBackup for Oracle – Release 7.1 (2011061213)
    Veritas NetBackup XBSA Interface – 7.1  2011082510 7.1.0.2
Veritas NetBackup for Oracle – Release 7.1 (2011082510)

If the build dates are still not correct, then enable the RMAN Channel Trace and inspect the resulting trace file in the user dump destination to see where the library is being loaded from.  E.g.

$ cat udump/oracle9_ora_21811.trc
…snip…
try loading : libobk.so
Loaded (/app/oracle/lib/libobk.so)

Windows Specific Information:

  • The library files of interest are *xbsa*.dll and orasbt.dll.
  • Symbolic links do not exist so there may be multiple copies of either file on the host.  Find and remove all copies of the library files and then reinstall NetBackup so there is only one copy.  The reinstall will place the libraries in the correct location so there isn’t any need for an oracle_link script on Windows.
  • The library file access times can be viewed in the File/Computer Explorer, but the ‘Date Accessed’ column must be added to the display.