Backup and Recovery for Microsoft SQL Server

Transcription

Backup and Recovery for Microsoft SQL Server
White Paper
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Abstract
Users are faced with many options and tradeoffs when choosing
a backup strategy for Microsoft SQL Server databases. This
white paper maps out those choices and examines how EMC®
Data Domain® deduplication storage systems preserves data
integrity, meets stringent RTO/RPO objectives, and integrates
easily into a multitude of active SQL or third-party backup
environments.
February 2012
Copyright © 2012 EMC Corporation. All Rights Reserved.
EMC believes the information in this publication is accurate as
of its publication date. The information is subject to change
without notice.
The information in this publication is provided “as is.” EMC
Corporation makes no representations or warranties of any kind
with respect to the information in this publication, and
specifically disclaims implied warranties of merchantability or
fitness for a particular purpose.
Use, copying, and distribution of any EMC software described in
this publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC
Corporation Trademarks on EMC.com.
Microsoft and Microsoft SQL Server, Microsoft Exchange,
Microsoft SharePoint, and Microsoft Hyper-V are registered
trademarks or trademarks of Microsoft, Inc. in the United States
and/or other jurisdictions. All other trademarks used herein are
the property of their respective owners.
Part Number h8116.2
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 2
Table of Contents
Executive summary.................................................................................................. 5
Audience ............................................................................................................................ 6
Introduction ............................................................................................................ 6
Data Domain Product Background ...................................................................................... 8
Advantages of Data Domain in a SQL Server Environment................................................... 8
EMC Data Domain Boost ..................................................................................................... 9
SQL Server Basics ................................................................................................. 10
Terminology .......................................................................................................... 11
Types of Backups ............................................................................................................. 11
Recovery Models .............................................................................................................. 12
Recovery Techniques ........................................................................................................ 12
Data Domain Integration Best Practices.................................................................. 14
Compression .................................................................................................................... 15
Multiplexing ..................................................................................................................... 15
Encryption ........................................................................................................................ 15
Backup Application Based Data Deduplication ................................................................. 16
Blocksize.......................................................................................................................... 16
Stripes ............................................................................................................................. 16
Backup Command ............................................................................................................ 18
Data Transfer Rates .......................................................................................................... 18
Integration ............................................................................................................ 19
Solution Planning ............................................................................................................. 20
Additional Considerations ..................................................................................... 20
Backup Types ................................................................................................................... 21
IP Network Considerations ............................................................................................... 21
Data Domain and Third-Party Backup Applications ........................................................... 21
Conclusion ............................................................................................................ 22
Appendix A: Index Fragmentation........................................................................... 23
Addressing the Challenge ................................................................................................. 23
Appendix B: Additional resources .......................................................................... 24
Microsoft Resource Links .................................................................................................. 24
EMC Data Domain links .................................................................................................... 25
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 3
List of Figures
Figure 1: Native MS-SQL Database Backup Tool ......................................................................... 6
Figure 2: NetWorker – MS SQL Client Properties – VSS Snapshot Configuration ......................... 7
Figure 3: Dual MS SQL Database Backups – NetWorker and Native SQL Server Back Up ............ 8
Figure 4: EMC Data Domain Boost .............................................................................................. 9
Figure 5: Microsoft SQL Server Management Studio Databases................................................ 10
Figure 6: Selection Recovery Model .......................................................................................... 12
Figure 7: Restore Database Dialog Box ..................................................................................... 13
Figure 8: Restore Database Options ......................................................................................... 13
Figure 9: Restore the Initial Full Backup then the First Transaction Log Backup ........................ 14
Figure 10: NetWorker MS SQL Client Restore GUI Example ....................................................... 14
Figure 11: Native SQL Backup - Disable Compression .............................................................. 15
Figure 12: Multi-striped Database Backup – Eight Stripes ........................................................ 18
Figure 13: Database Backup to a Null Device ........................................................................... 18
Figure 14: Multiple Null Disk Devices ....................................................................................... 19
Figure 15: Nominal Database Backup Performance .................................................................. 19
Figure 16: NetWorker Management Console ............................................................................. 22
Figure 17: DBCC “showcontig” Command Output ..................................................................... 24
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 4
Executive summary
Many database administrators prefer native Microsoft SQL Server backups directly to
disk compared to using third-party backup applications. When utilizing native SQL
Server backup, there is no reliance on the backup administrative team to perform
backups or play a role in database recovery. Additionally, there is no longer a need
for the database administrator to become proficient in deploying, configuring,
administering, or maintaining third-party backup applications.
Historically, native SQL backups have had some drawbacks for a couple of reasons:
•
Native SQL backup facilities do not provide automated media management
capabilities and therefore must write to disk devices. While backups performed to
disk media eliminated the challenge of manually managing tape cartridges, this
method also introduced the need for a considerable amount additional disk.
Conventional wisdom has traditionally been that the cost of disk versus
removable tape media was significantly higher.
•
Backup to disk did not meet the requirement of retaining an offsite copy of
database backups as part of a disaster recovery strategy. Native backup to disk
fell short of providing a viable solution for this requirement.
Deployed as database backup media, EMC® Data Domain® deduplication storage
systems address the legacy shortcomings of performing native database backups to
disk for the following reasons:
•
Data Domain deduplication storage systems optimize storage capacity, making
retention and replication of backup data exceptionally cost and network-efficient
by providing 10-30x data reduction
•
Data Domain systems are simple to integrate utilizing traditional backup software,
but also offer an alternative with high-speed, cost-effective backup directly to a
CIFS network share, utilizing native SQL Server backup. Users have the choice to
eliminate the need for third-party SQL Server backup application agents and their
associated operational costs and maintenance fees.
•
Data Domain Replicator provides up to 99% reduction in bandwidth required,
which enables users to send data offsite for faster “time-to-DR”
•
EMC NetWorker integration with EMC Data Domain Boost (DD Boost) significantly
increases performance by distributing parts of the deduplication process to
NetWorker storage nodes or applications hosts, and serves as a solid foundation
for additional integration between NetWorker and Data Domain systems
•
Data Domain systems benefit from the EMC Data Domain Data Invulnerability
Architecture – continuous recovery verification, fault detection and self healing,
and other resiliency features transparent to the backup application.
This white paper provides information about the use of Data Domain deduplication
storage as backup media for Microsoft SQL Server backups.
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 5
Audience
This white paper is intended as a guide for data protection architects, SQL Server
database administrative staff, backup administrators and EMC partners seeking
information about integrating Data Domain deduplication storage systems as a key
component in a comprehensive backup and recovery strategy.
Introduction
Microsoft SQL Server backup
methodology falls into one of two
generic categories. The first
consists of native SQL Server
database backups. This backup
technique creates SQL database
backups using tools and utilities
native to Microsoft SQL Server and
does not rely on third-party backup
application software (see figure 1).
The native database backup tool
performs a full database backup to
disk through a CIFS network share.
The tool is easy to use and
provides a feature set that
addresses business requirements.
Benefits include the use of backup
Figure 1: Native MS-SQL Database Backup Tool
and recovery interfaces familiar to
the database administrative staff. This ability is included with Microsoft SQL Server,
and there are no additional third-party software license fees.
The second backup methodology uses backup application software that integrates
with Microsoft SQL Server to perform SQL database backups based on the Virtual
Device Interface (VDI). This solution is typically packaged as a database agent
specifically for Microsoft SQL Server and a particular backup application. When VDI is
used, the backup application allows setting customized backup and recovery
parameters similar to those that can be employed when using native Microsoft SQL
tools and utilities.
EMC NetWorker backup software has the capability to utilize available snapshot
technologies designed to provide application consistency for the backup and
recovery processes. The EMC NetWorker Module for Microsoft Applications (NMM)
delivers unified, online backup and recovery utilizing Microsoft Virtual Shadow Copy
Services (VSS) for Microsoft applications including SQL Server, Exchange, SharePoint,
and Hyper-V.
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 6
The NetWorker graphical user interface (Figure 2) is an example of a backup
application that utilizes Microsoft VSS protection for SQL.
Figure 2: NetWorker – MS SQL Client Properties – VSS Snapshot Configuration
When the snapshot type is based on Microsoft VSS, the backup application is the
VSS requestor, the SQL Server is the VSS writer, and backup is coordinated with a
VSS provider. Advanced backup and recovery features such as disk staging and
instant recovery may be available with these implementations depending on the
backup application and agent being used.
Additional Concepts
Sometimes customers utilizing the native Microsoft SQL Server database backup
methodology, augment their solution with third-party backup client agents that
effectively protect the native backup data as a flat file. This two-phased methodology
is effectively “backing up a backup.” Among the perceived benefits of the augmented
solution is that it allows segregation of the SQL database administrative staff from
the data protection staff while providing means to retain database backups in
conformance with sound business practices and standardized corporate retention
policies.
There is another variant of the same methodology of an augmented backup solution
that utilizes two backup solutions in combination to satisfy business objectives.
Native SQL database backups are performed to a Data Domain system and
subsequently backup is performed by a third-party backup application and written to
the same Data Domain system (Figure 3).
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 7
Figure 3: Dual MS SQL Database Backups – NetWorker and Native SQL Server Back Up
Data Domain Product Background
Data Domain deduplication storage systems minimize backup and recovery times,
storage and network bandwidth, and risk of data loss. EMC offers a range of Data
Domain systems to meet the backup and archive requirements for companies of all
sizes as they seek to reduce costs and simplify data management.
Data Domain systems also offer replication that is extremely easy to deploy. The
primary advantage of Data Domain system replication is that the data is deduplicated
and compressed prior to being sent over the network.
Advantages of Data Domain in a SQL Server Environment
Data Domain systems can be directly integrated into Microsoft SQL Server
environments as disk backup media. In addition, Data Domain systems support all
leading enterprise backup and archive applications for seamless integration into
existing IT infrastructures.
The use of different backup methodologies with Microsoft SQL Server and Data
Domain systems typically has a negligible effect on overall data deduplication ratios.
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 8
This enables users to perform native database backups in conjunction with database
backups controlled by a third-party backup application without affecting
deduplication efficiency. This includes third-party backup applications that use a SQL
agent, with or without VSS snapshots. Additionally, the use of different numbers of
stripes or different blocksize values also has a negligible impact on deduplication
ratios.
Data Domain network-efficient replication can be used to create offsite copies of SQL
backups faster and more economically than legacy tape-based strategies. Data
Domain replication makes advanced disaster recovery preparedness for SQL Server a
reality.
EMC Data Domain Boost
EMC Data Domain Boost (DD Boost) distributes parts of the deduplication process
from the Data Domain system to the backup server or application client. In addition to
storage node support, NetWorker 7.6 SP2 or later supports DD Boost-based backup
from application hosts for Microsoft applications and databases. This is driving new
efficiency for users with NetWorker and Data Domain. By sending only unique data
from the NetWorker server or application client to the Data Domain system, less LAN
bandwidth is required, backups are 50 percent faster and the whole aggregate
system more manageable.
NetWorker provides operational capabilities for configuring, monitoring, and
reporting of backup and restores for Data Domain devices. This functionality is
provided through the NetWorker Management Console (NMC) portal. The NMC portal
is accessible from any supported remote Internet browser. The NMC Device
Configuration Wizard simplifies the configuration of storage devices, backup clients,
storage (target) pools, volume labeling, and save set cloning.
DD Boost dramatically increases the aggregate throughput, up to 50% faster than
NFS, and reduces the amount of data transferred over the network by 80 to 99
percent. These efficiencies can help eliminate future costs by leveraging existing
backup servers and Ethernet networks.
•
•
•
Increases backup speed up to 50%
faster
Reduces network traffic
Clone-controlled replication
-
•
Schedules replication
Catalog awareness
Ease of use
-
Wizard automated configuration
Monitoring and reporting
Figure 4: EMC Data Domain Boost
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 9
With DD Boost, backup applications can control replication between multiple Data
Domain systems and provide backup administrators with a single point of
management for tracking all backups and duplicate copies. This paradigm allows
backup administrators to efficiently create DR copies of their backups over the WAN
using DD Replicator software and keep catalog consistency for easy disaster recovery.
This also provides the flexibility for administrators to manage different retention
periods for each copy of data.
With NetWorker, the Data Domain replication process is managed by standard
NetWorker cloning, ensuring that NetWorker can recognize and manage a replicated
(remote) copy of data and assign unique retention policies to it. The administrator
has the ability to schedule the cloning process to run at a time that is most
appropriate for the business.
SQL Server Basics
A Microsoft SQL server instance includes system
and user databases. As depicted in Figure 5,
system databases are created at installation
and include:
•
The “master” database, which records all
system-level information for a Microsoft SQL
server. It contains records for all login
accounts and all system configuration
settings. The master database records the
existence and location of all other
databases.
•
The “model” database, which is used as a
template that contains the default settings
for all databases created within the
Microsoft SQL Server instance
•
The “msdb” database, which is used for
scheduling, alerts, and jobs
•
Figure 5: Microsoft SQL Server
The “tempdb” database, which serves as a
Management Studio Databases
global resource that contains all temporary
tables and temporary stored procedures. It is
re-created every time the Microsoft SQL Server instance is started.
Data protection strategies for the system databases are dependent on the database
being protected. For instance, transaction log backups are not supported for the
master database.
The master database cannot be recovered if a functional version of it does not already
exist. Recovery procedures for the master database may include re-installing
Microsoft SQL Server such that a backup of the pre-disaster master database can
then be restored.
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 10
The model and msdb databases can contain customized data such as user-specific
templates, scheduling information, as well as backup and restore history information.
Without a data protection strategy, these items will need to be manually
reconstructed in the event of a disaster.
The tempdb database is empty when the SQL instance is shut down, and does not
require protection as it is re-created at startup.
Terminology
Entire databases, specific database files, file groups, and transaction log backups are
among the supported backup types with Microsoft SQL Server. This section defines
the terminology associated with a given backup type.
Types of Backups
Database backups
• Database Backup – This is a full backup of an entire database and represents the
state of the database at the point when the backup is completed
•
Differential Database Backup – This is a backup of all the files within a database,
and contains only the extents modified since the most recent full backup of each
file. Restoring a database protected with full and differential backups to the most
recent point in time includes recovering the most recent full and differential
backup.
Partial backups
• Partial Backup – Partial backups provide flexibility for backing up databases that
contain some number of read-only file groups. This is a partial backup of all data
in the primary filegroup, each read/write filegroup, and any optionally specified
read-only files or filegroups.
•
Differential Partial Backup – This backup contains only the extents modified since
the prior partial backup of the same set of filegroups
File backups
• File Backup – This consists of a full backup of all data in one or more files or
filegroups
•
Differential File Backup – This is a backup of one or more files containing data
extents changed since the prior full backup of each file
Transaction log backups
• Regular transaction log backups are required when using the full or bulk-logged
recovery models. This backup contains all log records that have not been backed
up previously.
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 11
Copy-Only backups
• Database backups usually change the database in some way, such as truncating a
transaction log in the case of a full database backup. Copy-Only backups can be
used in cases where a backup of a database is required without changing the
database.
Recovery Models
Microsoft SQL Server includes three recovery models: simple, bulk logged, and full
(see Figure 6). The desired recovery model can be deployed based on requirements.
Functionally, each recovery model differs with regard to how backup and recovery
strategies are executed.
•
The full recovery model
includes log backups. This
model typically has no
exposure to data loss. Point-intime recovery is possible, up to
including the last committed
transaction.
•
The bulk logged recovery
model requires log backups.
This model permits highperformance bulk copy
operations. Recovery to the
end of any backup is possible;
point-in-time recovery is not
supported.
•
Figure 6: Selection Recovery Model
The simple recovery model
consists of performing full backups only. Logs are not backed up. In the event
database recovery is required, the most recent full backup can be restored. Any
changes that occurred subsequent to the last full backup must be redone. From a
transactional perspective, the database can only be recovered to the point of the
prior full backup.
Recovery Techniques
The technique used to restore a database will vary based on the recovery model being
used as well as the backup types being performed. Figures 7-10 provide a brief look
at restoring a database that was protected using the full recovery model with full and
transaction log backups. A single full backup was performed, followed by five
transaction log backups. Figure 7 depicts the restore database dialog box and general
database restore attributes. By default the full backup and subsequent transaction
log backups are all selected. Clicking the “OK” button would initiate recovery to the
most recent possible point in time. Alternately, recovery to a specific point in time is
also possible.
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 12
Figure 7: Restore Database Dialog Box
Figure 8 depicts restore
database options and
available database recovery
options. By default an
existing database will not
be overwritten. Also note
that by default the recovery
state is “RESTORE WITH
RECOVERY,” which leaves
the recovered database in
an online and unstable
state after the restore
process completes.
Figure 8: Restore Database Options
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 13
Figure 9 is an example of
a recovery transaction
that restores the initial
full backup, followed by
the first transaction log
backup. The remaining
transaction logs were not
included in this query for
brevity.
Figure 9: Restore the Initial Full Backup then the First Transaction Log Backup
EMC NetWorker and third party backup applications will each have a unique recovery
interface for databases. Many automate and coordinate the recovery of full and
transaction log backups similar to the way native Microsoft SQL Server tools and
utilities do.
Figure 10 is an example
of the NetWorker MS
SQL client restore GUI
Figure 10: NetWorker MS SQL Client Restore GUI Example
Data Domain Integration Best Practices
Table 1 presents a summary of the suggested best practices settings for Microsoft
SQL Server backup to Data Domain deduplication storage systems.
Table 1: Recommended Backup Software Settings
PARAMETERS AFFECTING DEDUPLICATION
PERFORMANCE
SQL Server 2008 native compression
SETTING
NO_COMPRESSION
Third-party backup application SQL Server local
compression
Disabled
Third-party backup application multiplexing
Disabled
Third-party backup application encryption
Disabled
Third-party backup application deduplication
Disabled
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 14
Compression
Specific to SQL Server 2008 Enterprise and later
versions, backup compression can be enabled
or disabled. The default product installation
does not compress backups. A server-level
compression setting can be applied that alters
default behavior. The use of the COMPRESSION
keyword within a backup SQL transaction
explicitly enables backup compression. The use
of the NO_COMPRESSION keyword within a
backup SQL transaction explicitly disables
backup compression.
Figure 11: Native SQL Backup - Disable
Figure 11 illustrates SQL Server 2008 properties Compression
for native compression; the “Compress backup”
service level property is used for backup jobs that do not explicitly enable or disable
compression.
Backup application software compression should be disabled because the Data
Domain system can fingerprint unique data segments more efficiently for
deduplication if the data segments sampled are not already compressed. Backup
windows can be extended and CPU performance can be impacted on the backup
client if the backup software is tasked with performing compression. Local
compression is provided for on the Data Domain storage system.
Multiplexing
When the Data Domain system is integrated as a backup device with a backup
application that supports multiplexed backups, EMC recommends disabling
multiplexed backups. Multiplexing limits the ability of the Data Domain system to
deduplicate incoming data.
Historically used as a speed matching solution where multiple slower data streams
were multiplexed into a single stream to take advantage of a somewhat faster tape
drives, backups to disk drives obtain no advantage from multiplexing. Whether
deployed as a CIFS share, NFS mount, VTL, or OpenStorage / DD Boost disk pool, Data
Domain systems accommodate writing multiple backup streams in parallel without
multiplexing.
Encryption
Encrypted files are by definition, unique. The encryption software that is part of the
backup application will create unique files, on-the-fly for each backup, defeating the
deduplication capabilities of the deduplication storage system. Data Domain
Encryption software provides encryption of data at rest and is persistent in flight
during replication with Data Domain Replicator software.
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 15
Backup Application Based Data Deduplication
Disabling deduplication from the backup application software will provide better
performance and allow the Data Domain system to offload this work. Data Domain
systems are optimized to provide the very best ingest performance and deduplication
ratios.
Table 2: Recommended Backup Software Parameter Settings
PARAMETERS AFFECTING BACKUP AND
SETTING
RECOVERY PERFORMANCE
BLOCKSIZE
Default 512 byte or higher based on
performance improvements
Stripes
Consider the use of multiple stripes to
improve backup and restore data
transfer rates
Blocksize
The, “BLOCKSIZE” keyword can be used to alter physical block size used when writing
to backup media. By default the backup process will automatically select a block size
appropriate for the backup device. Supported sizes are 512, 1K, 2K, 4K, 8K, 16K, 32K
and 64K bytes. The default value used for disk backup is 512 bytes.
The default 512-byte size yields excellent performance with Data Domain systems.
Third-party backup applications may substitute their own default value. The fact that
this parameter can be adjusted is included as reference. The use of larger sizes may
improve or degrade performance. Users are encouraged to investigate further to
determine what value may provide optimal results in their environment.
Stripes
While not a keyword within the context of Microsoft SQL Server, the term stripes
correlates to the number of simultaneous backup streams to be created for a given
backup operation. In the case of disk backups with SQL Server, multi-streamed
backups are performed by specifying a number of backup disk targets with the
BACKUP command.
Table 3: Mount Options
MOUNT OPTIONS
When performing native database
backups
When using a third-party backup server
SETTING
UNC path
Dependent on backup application and
server OS type
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 16
When the Data Domain system is used as a disk backup media for native Microsoft
SQL Server backups, configuration is performed utilizing a CIFS share.
As a general rule, the UNC path to the share should be used instead of a mapped
drive because:
a) Scheduled backups may execute when no user is logged in to the server
b) When Sqlservr.exe is executed as a service, it has no relation to a login
session
Table 4: Miscellaneous Options
MISCELLANEOUS OPTIONS
CONFIGURATION
Comingling native and third-party backup application
database backups to the same Data Domain system
Yes
Replication
Yes
Comingling native and third-party backups to a Data Domain system should have only
a negligible impact on deduplication ratios because of the variable segment
processing and Stream Informed Segment Layout (SISL) architected into Data Domain
systems.
Since Data Domain Replicator software only sends unique, compressed data
segments to the remote system it is ideal for network-efficient disaster recovery.
Table 5: Infrastructure Configuration
INFRASTRUCTURE
CONFIGURATION
Server Disk Subsystem
Database and log files should be placed
on disk storage with performance
attributes facilitating required
transaction and backup performance
metrics
IP Network
Dedicated backup network that meets or
exceeds bandwidth requirements for the
desired data transfer rate
EMC Data Domain System
Sized to meet or exceed ingest rate and
backup retention capacity requirements
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 17
Backup Command
The recommended use of SQL stripes is as a speed matching technology. Multiple
backup streams from a given database can be simultaneously written to a target Data
Domain system in an effort to achieve an aggregate data transfer rate that aligns with
business requirements.
Figure 12 illustrates a multi-striped
database backup that uses eight
stripes in an effort to improve backup
data transfer rate performance.
Multiple stripes can be used to better
match data transfer rate capabilities
between source and destination
media.
Figure 12: Multi-striped Database Backup – Eight Stripes
Data Transfer Rates
Multiple business objectives are considered when determining required backup and
recovery data transfer rates. Decision criteria include backup window duration, log
growth, and recovery time.
By definition, slow backups are
those that fail to meet or exceed
business objectives. Understanding
factors that can affect performance
is critical to removing them from the
environment.
Figure 13: Database Backup to a Null Device
A reasonable place to start any backup performance investigation is to understand
the theoretical maximum speed at which SQL Server can process a given database
backup. Performing a database backup to a null disk device provides an estimate of
that maximum achievable speed in a given environment. Figure 13 depicts a
database backup to a null device.
The results of the query indicate that the theoretical maximum rate at which the SQL
Server backup function can extract data from this database using a single stripe is
approximately 80 MB/sec. Regardless of the data transfer rate at which the backup
media can accept data, backing up this database as it currently stands will be limited
to 80 MB/sec when using a single stripe.
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 18
Figure 14 depicts a database
backup to multiple null disk
devices.
Figure 14: Multiple Null Disk Devices
Figure 15 depicts nominal
database backup performance
improvement with a moderately
tuned eight-stripe SQL database
backup with an aggregate data
transfer rate of approximately 172
MB/sec, indicating that the
network-attached backup devices
are not limiting throughput.
Figure 15: Nominal Database Backup Performance
Integration
EMC NetWorker and third-party backup applications used to protect Microsoft SQL
Server can also take advantage of Data Domain systems employed as backup media.
Data Domain systems are easily configured as varied backup media types and
protocols including VTL, CIFS share, NFS mount, or Data Domain Boost (DD Boost) for
backup applications such as EMC NetWorker.
Additionally, DD Boost enables managed replication capabilities known as, “clone
controlled replication” with EMC NetWorker.
In this scenario, backup images are replicated from one Data Domain system to
another under the direct control of NetWorker or other supported backup
applications. DD Boost monitoring, reporting, and cataloging of replicated backup
images and savesets can be used to architect a comprehensive disaster recovery
plan.
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 19
Solution Planning
Capacity and performance planning play a critical role in both successful deployment
and ongoing production usage of a Data Domain system. Detailed capacity analysis
should be performed by a knowledgeable EMC Velocity partner or an EMC technical
consultant. The analysis considers database sizes, growth rates, change rates, and
retention periods as input criteria. Performance analysis considers data points such
as the required aggregate data transfer rate for backups, connection topology
requirements to support the data transfer rate, and the Data Domain system required
to meet or exceed the required data transfer rate.
Beyond capacity and performance planning are additional considerations for Data
Domain system replication.
Additional Considerations
Replication Scope
Replicating all database backups is certainly possible. However, many users will want
to implement replication at a more granular level. Production database backups are
usually excellent replication candidates, whereas development and test database
backups are less critical. An analysis of network bandwidth and destination disk
space requirements should be performed by a knowledgeable EMC Velocity partner or
an EMC technical consultant.
Replication Topology
Backups are typically replicated to serve as a second backup copy for recovery in the
event of a disaster. When backups from a primary site are being replicated to a
secondary site, planning is relatively straightforward. Users with multiple primary
sites may decide to implement a bidirectional replication solution where database
backups from either site are replicated to the alternate site. Proper planning should
render an outline detailing which database backups are being replicated to each
location.
Tape Consolidation
Some users replicate backup images to a central location for disaster recovery
purposes while also using the solution as a vehicle that enables centralized tape
creation. The third-party backup application used to create tape-based backup copies
will dictate any additional considerations or restrictions that this solution involves. A
knowledgeable EMC Velocity partner or an EMC technical consultant will be able to
assist with this planning task.
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 20
Backup Types
The goal of backups is to satisfy recovery time and point objectives. Outlining a
strategy of full, differential, and transaction log backups is beyond the scope of this
paper. That stated, there are a few key points worth noting:
•
Performing full backups frequently with Data Domain deduplication storage does
not create a storage usage penalty, as redundant database segments do not
consume additional disk space. While this may appear to enable the ability to
perform full backups more frequently, the load full backups place on the SQL
server and connection topology to the Data Domain system should be taken into
consideration.
•
When split-mirror or snapshot backups are performed and controlled by a thirdparty backup application, the Data Domain system is easily integrated as a
backup storage device. The features provided by these backup techniques (lowimpact backups, instant recovery, and so on) do not preclude the use of Data
Domain technology.
IP Network Considerations
When Data Domain systems are deployed as a CIFS backup share, EMC recommends
interconnecting SQL servers and Data Domain systems using a dedicated backup
area network. When deployment is in conjunction with a backup application as a CIFS
share, NFS mount, or OpenStorage / DD Boost disk pool, EMC similarly recommends
interconnecting backup application media servers and Data Domain systems using a
dedicated backup area network.
Whenever possible, the network used for backup and recovery communications
should be segregated from other production networks. This best practice
recommendation seeks to assure that network bandwidth is available for backup and
restore jobs to meet or exceed business objectives.
Network bandwidth requirements may dictate the need for a topology that supports
data transfers in excess of 125 MB/s. All Data Domain systems support the use of
multiple GbE network interfaces, and the use of 10 GbE network interfaces.
A knowledgeable Data Domain system engineer will be able to assist with planning
the deployment based on user requirements and available resources.
Data Domain and Third-Party Backup Applications
When Data Domain systems are integrated with EMC NetWorker and third-party
backup applications, it is important to note that Microsoft SQL Server backup
parameters are handled the same as when compared to a native SQL Server backup
implementation. The COMPRESSION, and BLOCKSIZE keywords, as well as any
striping, are still valid parameters. Some of these settings may or may not be
unavailable when using a third-party backup application.
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 21
Figure 16 depicts the
NetWorker
Management Console
interface for
configuring MS SQL
Backups.
Figure 16: NetWorker Management Console
Users of third-party backup applications seeking to exploit the full complement of
available Microsoft SQL Server backup options should contact their software provider
in the event additional information is required.
Conclusion
A Data Domain system makes an excellent target for Microsoft SQL Server backups
because it integrates easily and seamlessly into existing SQL Server environments.
Data Domain systems allow the SQL Server administrative team to retain a greater
number of full backup images online, thereby optimizing recovery options while
occupying minimal footprint in the data center, utilizing native backup tools that are
familiar to SQL Server administrators.
The addition of a Data Domain system into the environment greatly reduces
dependence on legacy tape and provides faster “time-to-DR” with network-efficient
replication.
When Data Domain Boost integration with EMC NetWorker is leveraged, performance
can be greatly improved and the managed replication includes the remote backup
image in the saveset database for easy recovery.
It is for all of these reasons that more people choose to build their backup solutions
using EMC products and technology.
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 22
Appendix A: Index Fragmentation
Index fragmentation affects I/O performance of queries whose data pages do not
reside in the Microsoft SQL Server data cache. A variety of techniques are commonly
used to reduce index fragmentation, including but not limited to “DBCC
INDEXDEFRAG”, “DBCC DBREINDEX”, and “CREATE INDEX WITH DROP EXISTING”.
While these techniques are effective in reducing index fragmentation, they can also
have a negative impact on deduplication. Database administrative teams that
routinely defragment all indexes at some predetermined frequency may notice
reduced data deduplication rates on their Data Domain systems. The end result is
reduced storage efficiency.
Index defragmentation has the effect of reorganizing the pages within a database
such that Data Domain deduplication sees the backup data stream as new, unique
data. In addition to the inefficient use of backup device storage space, this can also
impact the ability to replicate database backups using Data Domain replication.
A greater quantity of unique data blocks equates to replicating a greater quantity of
data over what may be a bandwidth limited WAN.
Database administrative teams may find themselves in a situation where index
fragmentation impacts query performance, and frequent index defragmentation
impacts backup storage device performance in terms of deduplication and replication
rates.
Addressing the Challenge
EMC recommends addressing these challenges with a balanced approach. For
instance, instead of defragmenting all indexes based on a schedule, consider
defragmentation based on thresholds. Additionally EMC recommends the use of
index keys that are less prone to fragmentation in the first place.
Is index fragmentation the only issue impacting transaction performance?
I/O subsystem performance, memory usage, and CPU utilization can all have a
negative impact on query performance. These issues should be diagnosed and
resolved versus the use of frequent automatic index defragmentation to improve
performance.
File fragmentation can also impact performance. Many small databases sharing the
same logical disk volume combined with the use of the “autogrowth” property can
cause logically sequential database files to allocate non-sequential physical storage
on disk. Ideally, administrators should set the size of database files at deployment to
accommodate potential future growth.
While it may be impossible to anticipate the size of a given database three years into
the future, doing so helps to reduce the possibility that file fragmentation will impact
query performance. If automatically growing database files is a requirement, consider
growing in large chunks versus small chunks. It may be impractical to locate each
database on a unique logical volume, but consider doing so for databases that are
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 23
expected to grow considerably over time. Finally, disk file fragmentation can be
reduced by Windows file system defragmentation utilities such as the Windows “Disk
Defragmenter.”
Do all indexes need to be defragmented or just a subset?
EMC recommends the use of index defragmentation tools based on thresholds and
limits versus automatically defragmenting every index on every table whether it is
required or not. The suggestion is to understand what indexes and their
corresponding fragmentation levels impact performance.
These indexes should be monitored for a specific fragmentation threshold, and action
taken to defragment these indexes only when necessary. Selective index
defragmentation will have less impact on production and will assist in preserving the
ability to efficiently deduplicate database backups.
Figure 17 depicts the DBCC showcontig
command output. It includes extent
scan fragmentation data indicating
that index “C_CustomerI1” does not
require defragmentation at this time.
Figure 17: DBCC “showcontig” Command Output
Structuring indexes and keys so as to minimize fragmentation may or may not be
realistic in all cases, but it should be considered as it potentially reduces the need to
defragment indexes frequently. Index and key inserts that occur at the end of the
table and index are likely to reduce fragmentation. Deletes that occur in contiguous
chunks also assist in reducing fragmentation.
Appendix B: Additional resources
Microsoft Resource Links
Backing Up and Restoring Databases in SQL Server - from SQL Server 2008 Books
Online: http://msdn.microsoft.com/en-us/library/ms187048.aspx
Backing Up and Restoring Databases in SQL Server - from SQL Server 2005 Books
Online: http://msdn.microsoft.com/en-us/library/ms187048(SQL.90).aspx
Optimizing Backup and Restore Performance in SQL Server - SQL Server 2005 Books
Online: http://msdn.microsoft.com/en-us/library/ms190954(SQL.90).aspx
Microsoft SQL Server Community
http://technet.microsoft.com/en-us/sqlserver/bb671048.aspx
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 24
EMC Data Domain links
EMC Backup and Recovery for Microsoft Applications — Deduplication Enabled by
EMC CLARiiON and Data Domain white paper
http://www.emc.com/collateral/software/white-papers/h7051-backup-recoverymicrosoft-deduplication-clariion-wp.pdf
EMC Data Domain Family products and deduplication technology
http://www.emc.com/products/family/data-domain-family.htm
Technical Notes - Using EMC® NetWorker® Module for SQL Server® with Data Domain
Boost® for Improved Backup and Recovery Performance
http://powerlink.emc.com/km/live1/en_US/Offering_Technical/Technical_Documentation/300-013159.pdf?mtcs=ZXZlbnRUeXBlPUttQ2xpY2tDb250ZW50RXZlbnQsZG9jdW1lbnRJZD0wOTAxNDA2NjgwN
WZkOTMwLGRvY3VtZW50VHlwZT1wZGYsbmF2ZU5vZGU9MGIwMTQwNjY4MDQyNzY2OF9Hcmlk
®
EMC Data Domain Boost Software
http://www.emc.com/products/detail/software/data-domain-boost.htm
EMC Data Domain SISL Scalability Architecture — A Detailed Review white paper
http://www.emc.com/collateral/hardware/white-papers/h7221-data-domain-sislsclg-arch-wp.pdf
EMC Data Domain Replicator Software — A Detailed Review white paper
http://www.emc.com/collateral/software/white-papers/h7082-data-domainreplicator-wp.pdf.pdf
EMC Data Invulnerability Architecture: Ensuring Data Integrity and Storage System
Recoverability white paper
http://www.emc.com/collateral/software/white-papers/h7219-data-domain-datainvul-arch-wp.pdf
®
EMC Networker Software
http://www.emc.com/backup-and-recovery/networker/networker.htm
EMC NetWorker Online Community
https://community.emc.com/community/connect/networkeronline
IDC Study – Worldwide Purpose Built Backup Appliances:
http://www.emc.com/collateral/analyst-reports/idc-worldwide-purpose-built-backup-appliance-20112015.pdf
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Best Practices Planning Guide
Page 25