|

Today information growth has reached alarming proportions. According to an IDC estimate, the annual growth of data in organizations is about 50 percent. If one were to look at the corresponding storage costs, it becomes a formidable challenge for any enterprise to protect and manage this data.
The repercussions of not managing information effectively can result in data loss and impaired decision-making. It can impact relationships with customers and business partners resulting in lost revenue and worse – loss of reputation. Clearly, present-day organizations cannot rely on data protection models of yesteryears such as tape-based storage that are decentralized and populated primarily with physical servers. Virtualization and large amounts of data mandate a new approach to protecting and managing information.
Next-generation tools such as disk-based backup are revolutionizing data protection. De-duplication is enabling a new era of information management. Now with the ability to de-duplicate data everywhere and manage it centrally, organizations can improve data protection operations and lower costs. It will also help move towards a more systematic approach for managing information growth.
Why do we need to de-duplicate data?
Simply stated, de-duplication is the process of eliminating redundant data. De-duplication stores only unique data at the sub-file level. Needless to say, in environments where storage needs continue to intensify and managing costs remains a key issue, de-duplication offers a welcome relief for today’s organizations.
De-duplication enables companies reduce storage costs. There are other benefits such as bandwidth savings, faster backups, backup consolidation, and easier disaster recovery – depending on where and how it is used.
To ensure complete efficiency, de-duplication should be present in every segment of the information architecture. It has the potential to reduce storage consumption by up to 80 percent. De-duplication across physical and virtual backups can provide rapid recovery of applications in the event of a disaster.
De-duplication can be performed in two places: at the source or at the target. De-duplication as close to the information source as possible delivers the most value. De-duplication at the source
With source—often referred to as client-side—de-duplication, data is de-duplicated before it is transmitted across the network and stored. Eliminating redundant data before it is sent across the network ensures efficient utilization of bandwidth, storage and virtual machine resource across the entire infrastructure.
Most client-side de-duplication solutions work the same across virtual and physical environments. As a result, regardless of whether it is a virtual machine or a physical machine, less data is stored. This not only reduces storage costs in the data center, but also makes it easier to move data to a disaster recovery site using replication.
De-duplicating at the target
De-duplication can also occur at the target, such as a media server or a storage appliance. With media server de-duplication, backup data moves from a client (the system protected) to the backup software’s server (i.e., media server). The media server performs the de-duplication and sends only the unique data segments to the backend storage. This leads to savings in backend storage as well as a reduction in the infrastructure needed to store backup data.
Like de-duplication at the media server, de-duplication by an appliance is also considered target-side de-duplication. With a disk-based de-duplication appliance, backup data moves across a network from a client to a backup server and then to the appliance. The appliance performs de-duplication and sends the unique data to its storage source, resulting in an overall reduction in backup storage.
While most backup software products see these appliances as native disk, some vendors have begun to offer solutions with tighter integration between the software and the storage appliance.
Clearly, de-duplication is a cost-effective information management tool that organizations can use virtually anywhere in their enterprise to address pressing IT challenges. From remote offices, to virtual machines, to data center workloads, de-duplication can play a role in controlling storage costs, increasing reliability, and simplifying operations.
Any enterprise that is data intensive must look at having an in-built de-duplication strategy in their back-up and recovery plan as this technology will show results.
Client-side de-duplication can improve backup times for physical and virtual machines and reduce bandwidth requirements. Of course, target-side offers similar storage benefits and may not require updates to existing backup clients. Finally, there are solutions on the market that offer a combination of both source and target de-duplication to achieve even greater storage savings and RoI.
Find the approach that works best for you. Once you understand the requirements of your data center, you can decide on where to deploy the de-duplication feature.
The author is Vice-President, Information Management Group, Symantec India. |