Disk-based backup and deduplication have been godsends for many organizations looking for a fast, effective way to protect and store their growing amounts of data. However Oracle DBAs still sometimes feel like these two technologies have come up short in ways that have not adequately been addressed. SEPATON’s new DBeXstream technology changes this by giving Oracle DBAs access to these two technologies with the corresponding increases in throughput and deduplication ratios that they were originally led to believe they would see.
Almost no one disputes that the amount of data that enterprises manage is on the upswing. The only question is, “How much?” To answer that question, one analyst firm recently found that in two thirds of the 185 organizations it surveyed that their average annual total volume of data is growing at a rate of approximately 28%.
This is why technologies like deduplication and disk-based backup are warmly received by so many organizations as together they expedite backups and minimize backup data stores. However not all types of data have benefited equally from these technologies.
Oracle databases have particularly struggled. As they also experience data growth rates that are the same or larger as other data types, Oracle DBAs have likewise looked to leverage disk-based backup and deduplication to solve their data protection challenges. Unfortunately the results for these solutions have been mixed to date.
Oracle DBAs are accustomed to concurrently streaming multiple Oracle backup channels to tape and want to carry that same methodology over to disk and then deduplicate it. However due to the nuances of how deduplication works, Oracle backups have become more complicated to deduplicate and costly to store on disk than originally expected.
Oracle DBAs for the most part expect to replace tape with disk as the backup target while keeping their current processes of multi-streaming and multiplexing Oracle backup jobs in place. The challenge that emerges is that deduplication algorithms cannot effectively filter and find duplicate data in multiplexed, multi-streamed Oracle backup jobs unless the algorithms are modified to handle these specialized types of backup streams.
Amplifying the challenge, many backup software products use Oracle RMAN underneath the covers to perform these backups. So in addition to the metadata that the backup software itself inserts into the backup streams, Oracle RMAN also introduces its own metadata (such as time stamps and sequence numbers) into these backup streams.
So what starts out as a “simple” Oracle backup stream turns into a complex multi-streamed, multiplexed backup job that is intermixed with metadata from backup software and Oracle RMAN. This makes it more difficult for deduplication algorithms to achieve high deduplication ratios of Oracle data which, in turn, results in higher disk storage costs.
These challenges are why some Oracle DBAs are disillusioned about using the combination of disk-based backup and deduplication in their environment. Costs may not go down, backup times may increase and they have to tweak their approach to Oracle backups without any clear sense of whether or not it will actually benefit them or their organization.
To mitigate these downsides of implementing disk and deduplication in their environment, Oracle DBAs ideally want a solution to deliver the following when backing up their Oracle databases:
Maximum Throughput with Optimal Storage Capacity
It is in response to this that SEPATON added DBeXstream technology to its S2100-ES2’s DeltaStor deduplication algorithm. DBeXstream accounts for the specific nuances of deduplicating multi-streaming, multiplexed Oracle backup jobs that also have the metadata of the backup software and Oracle RMAN sprinkled in.
To deduplicate the data in these backup jobs, DBeXstream fingerprints the incoming backup streams to identify regions of data that are similar to prior backups. SEPATON uses a unique zoomable fingerprinting technology that is part of its DeltaStor DBeXstream deduplication algorithm to generate fingerprints that provide both coarse and fine identification of similar regions.
The fingerprints are generally unaffected by either the repeating metadata introduced by Oracle or the randomization of data caused by multiplexing and multi-streaming. This is due to the fingerprints being sized for sub-8K regions.
Regions with matching coarse fingerprints are compared with those regions with matching fine fingerprints deduplicated using DeltaStor’s byte level differencing. Duplicate data in the prior backup is then replaced with a pointer to the common data in the latest backup and its space reclaimed in the background by the continuously running Transparent Space Reclamation feature.
This approach gives SEPATON’s DeltaStor deduplication algorithm a decided advantage over other deduplication techniques when deduplicating Oracle data. SEPATON does not break backup jobs into specific size blocks or chunks when deduplicating such as inline hash-based deduplicating algorithms commonly do. SEPATON only looks to duplicate data between the pointers that its DBeXstream technology has inserted into the backup stream.
Further, since SEPATON’s deduplication processes are concurrent with ingest and are not in the data path, there is no degradation of ingest performance as occurs with inline hashing approaches. Ingest is then a deterministic TB/hr per node regardless of data type, data change rate or backup policy.
Using this architecture, an enterprise may independently add either more nodes (CPUs and cache for performance) or controllers (storage capacity) as required. This gives organizations the flexibility to optimize performance or increase capacity as needed without implementing an entirely new solution.
This architecture also enables enterprises to start small and scale it over time. A good entry point for the SEPATON S2100-ES2 into enterprises is those with approximately 200 TBs of data to backup or about 10 TBs of backup data per night assuming a 5% change rate.
SEPATON as a solution becomes even more compelling to Oracle DBAs who use Symantec NetBackup. SEPATON’s adoption of Symantec’s API called Open Storage Technology (OST), allows Oracle DBAs who use the S2100-ES2 to seamlessly leverage Optimized Synthetics and Auto Image Replication (A.I.R.) technologies in their database environments.
Using Symantec’s Optimized Synthetics feature, SEPATON will, when notified by NetBackup, create a new full backup on the S2100-ES2. This backup takes place without touching any clients or the network. Rather the full backup occurs entirely on the S2100-ES2 and consists of pointers to the most recent full backup and incrementals that have occurred since that last full backup.
Using OST, NetBackup communicates with the S2100-ES2 and tells it what data to replicate and where. Once the replication is complete, NetBackup then makes updates to its catalog at both the primary and secondary sites.
OST A.I.R. takes this one st
ep further as it facilitates multi-do
main replication. This means organizations can create separate NetBackup domains for their Oracle backups and manage and replicate those backup images independently of their other corporate backup data.
This unique combination of features gives Oracle DBAs the flexibility to:
- Accelerate Oracle backups through DBeXstream
- Minimize or eliminate the need to do full backups using Optimized Synthetics
- Separately manage and control the replication of these Oracle backup job images through Symantec OST A.I.R.
- Centrally catalog all replicated Oracle images using NetBackup
Oracle DBAs want better, faster backups like anyone else. However they have often worked at a disadvantage unable to fully leverage the full benefits that disk-based backup and deduplication have to offer.
SEPATON DBeXstream levels the playing field as it enables Oracle DBAs, using any leading enterprise backup application, to keep the backup processes and procedures they already have in place and still get the increases in throughput and deduplication ratios that they expect. Further, using Symantec NetBackup’s Optimized Synthetics and A.I.R. technologies, Oracle DBAs can now realize the added benefits of being able to non-disruptively create full backups at any time and control the movement of their data offsite while still keeping the management of their data under centralized control.