Large restores and non-optimized duplication jobs (tape for example) from a media server deduplication pool (MSDP) perfom at a slower rate than from other non-deduplicated storage units. In some instances these jobs may perfom unacceptably slow and can cause:
- Increased tape drive wear for duplications written to tape
- Large SLP backlogs
- Large number of queued jobs in the Activity Monitor
- Missed SLAs
With deduplication, image files are broken down into segment objects (SOs) which are then stored within containers. As similar segments are sent to the storage pool, they are referenced inside of a database with existing segments rather than written to storage. This philosphy is also known as single instance storage. Because of the nature of the deduplication process, read operations such as restores and duplications require the reassembling of these segment objects (SOs) within images to bring the data to its pre-deduplicated state. There are several factors in investigating and improving the performance of these operations which are discussed in this document.
1. Host resources
The first area to consider is the hardware configuration of the MSDP storage server. The recommended minimum specifications are listed below.
a. Memory: at least 1 GB per TB of storage + 4 GB for OS + 4 GB for NBU
b. Storage IO: at least 130 MB/sec reads/writes, 250 MB/sec or more for enterprise class performance. It is not recommended to use iSCSI or thin provisioned storage (see TECH214907)
c. Processor: at least four cores at 2.2 GHz
d. Operating system: 64 bit operating system
If the storage server is to accept data from several clients or other storage servers, more hardware resources may be needed. Deduplication by nature is a very CPU, memory, and IO intensive process, so an optimial build is essential for both performance and reliability. If using a Windows based storage server, antivirus (AV) needs to be excluded from the MSDP file system to prevent quarantining of database containers which often results in data loss. Other appliacations which scan or alter file systems must be modified to not change the MSDP data and database file systems.
Note: for customers considering Microsoft Windows operating system for their MSDP media server, please see ‘HOWTO: Special considerations when deploying NetBackup Media Server Deduplication on Windows systems’ HOWTO61249.
2. Roles of the storage server
The best performing and most reliable configurations have MSDP built on a dedicated media server. Adding roles such as Master Server, OpsCenter, VMWare backup host, Fiber Transport Media Server (FTMS), or others such as Active Directory, File Services, SQL Server, etc. will need to be planned for as the requirements in part 1 above are written for the MSDP role.
3. Storage performance
To assess performance of the storage partition, stop all NetBackup services on the storage server host and others that could be using resources on storage. In Unix, the lsof command can be used to verify that there are no open files on storage. This will ensure that the IO tests are most accurate as the test will not be competing with other processes. It is recommended to use the nbperfhck utility to obtain average statistics for read and write performance. For the most accurate results, use a file size larger than the memory installed in the storage server host or if using an external storage array, higher than the array cache. See HOWTO72940 for more information on this utility. Sample syntax is shown below to show how statistics are returned.
Write performance test:
# nbperfchk -i zero: -o /disk/nbperfchk.tmp -s 150g
153600 MB @ 777.6 MB/sec
Read performance test:
# nbperfchk -i /disk/nbperfchk.tmp -o /dev/null
153600 MB @ 570.9 MB/sec
In this example, sequential read/writes perform at 571 and 778 MB/sec. This is important to establish as it is the maximum that MSDP will be able to rehydrate an image not including CPU, memory, and other overheads. If the MSDP database and storage partitions are on different LUNs, then the IO tests should be performed on both.
4. Intended load
How many clients, daily data volume, policy type, concurrent backups, duplication jobs, and replication jobs will be writing to the storage pool also needs to be considered. In medium and large environments, SLP operations should be tuned to keep duplication and replication streams (IO reads) separate from backup streams (IO writes). This makes the most effecient use of caching on the RAID controller and file system. NetBackup 7.6 has the capability to control SLP processing using start windows to help separate read and write IO traffic. More information on this can be found in TECH211111.
5. Nature of the data
During backups, new and unique data segments are stored within the data containers. With the initial backup, the data segments will all be in close proximity to each other as it was written sequentially. After the first backup, only unique data segments are stored at the end of the containers. This in turn causes fragmentation and poor segment locality. Given that client data changes over time, images are less likely to have common data segments with the previous backups. All of this changed data is appended to the last container and new containers are created as needed in 7.6. The deduplication rate may be high but over time small changes add up, causing the data to be scattered across the containers and underlying file system. Client and policy combinations with a high change rate and/or low deduplication rates will be impacted more quickly with a loss of locality. For these reasons unstructured data types tend to be impacted more significantly but loss of locality will occur on all data types over time.
Upon configuration of a new storage/deduplication pool, it is recommended to run a full backup of each client’s base operating system: “/” for Unix/Linux and ‘C:\” for Windows systems. Doing this stores the core OS data together which is similar across the client platforms.
The process of creating better segment locality with the deduplication pool is referred to as rebasing. Over time the data for a specific policy and client combination will become be spread amongst many data containers and the disk itself. Server side rebasing works at the storage pool/server level to move segment objects between containers so they are closer together. This is similar in concept to degragmenting a file system as objects are stored more contiguously to improve read performance. The server side rebasing is a low priority process and therefore will not always occur due to load conditions. Because of this, busier storage pools might not rebase as often as they should. The process can also be limited by the number of segments which can be moved.
Client side rebasing looks at the locality of a client and policy combination when the backup begins and if a particular threshold is not met, client side rebasing occurs. Client side rebasing will resend these segments to the storage pool, creating a fresh copy to improve locality. This may cause a short term increase in storage usage but has not been found to have an impact on day-to-day operations. In NetBackup 7.6 and higher, client side rebasing is enabled by default. Client side rebasing can be enabled manually in previous versions by following the steps provided in HOWTO70652. Due to the nature of the Accelerator feature, client side rebasing cannot occur on backups with it enabled.
NOTE: Images created with the Accelerator enabled will expierience poorer locality due to client side rebasing not being unavailable and therefore experience poorer rehydration performance.
7. Segment locality
If it is suspected that SOs are not stored optimally or more evidence is needed to explain rehydration performance problems, a locality check can be performed. Please contact Symantec Technical Support for more information on this step and advanced troubleshooting.
8. MSDP configuration file tuning
Depending on the server hardware, operating system configuration and system load, changes can be made to the MSDP configuration files to improve performance further. Care should be taken as making many or large changes at once can compromise system stability. It should also be noted that changes intended to improve rehydration performance may negatively affect backup performance, particularly parameters related to rebasing as they can create additional overhead on the storage server. It is strongly recommended to make a backup copy of the configuration before editing it for rollback purposes, make small changes at once, and test with backups and restores in between changes.
- Change PrefetchThreadNum=1 to PrefetchThreadNum=8 to speed up prefetching. If results are satisfactory, change PrefetchThreadNum=8 to PrefetchThreadNum=16. This parameter specifies how many threads to use to preload segments during rehydration.
- Verify MaxNumCaches=1024. This allows more containers to be opened simultaneously to avoid frequent opening and closing of the same containers.
- Verify RebaseScatterThreshold=64MiB. This parameter specifies the average data size threshold per container for a given backup image to be considered for rebasing.
- Change RebaseMaxPercentage=5 to RebaseMaxPercentage=50.
- Change : RebaseMaxTime=150 to RebaseMaxTime=600.
- Windows storage servers only: Change ReadBufferSize=65536 to ReadBufferSize=262144.
Change “PREFETCH_SIZE = 33554432″ to “PREFETCH_SIZE = 67108864″. This is the buffer used for restore operations, in bytes.
c. If necessary, make changes to the below files on the NetBackup side. Below are the starting points that work best in most environments.
- SIZE_DATA_BUFFERS = 262144
- SIZE_DATA_BUFFERS_DISK = 1048576
- NUMBER_DATA_BUFFERS = 256
- NUMBER_DATA_BUFFERS_DISK = 512
- NET_BUFFER_SZ = 1048576
- NET_BUFFER_SZ_REST = 1048576