In computing, data deduplication is a specialized data compression technique for eliminating coarse-grained redundant data, typically to improve storage utilization. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored, along with references to the unique copy of data. Deduplication is able to reduce the required storage capacity since only the unique data is stored.
Depending on the type of deduplication, redundant files may be reduced, or even portions of files or other data that are similar can also be removed. As a simple example of file based deduplication, a typical email system might contain 100 instances of the same one megabyte (MB) file attachment. If the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is just referenced back to the one saved copy. In this example, the deduplication ratio is roughly 100 to 1.
Deduplication can also provide significant energy, space, cooling and costs savings, by reducing the amount of data stored. It contributes significantly in the process of Data Center Transformation through reducing carbon footprints due to savings on storage space and reduces the recurring cost of human resource to management and administration. It also reduces the recycling of the hardware and the budget for data management, backup and retrieval by lowering fixed and recurring cost.
|