Duplicate Elimination service processing

HCP performs duplicate elimination by first sorting objects, parts, and chunks according to their MD5 hash values. After sorting all the objects, parts, and chunks in the repository, the service checks for objects, parts, and chunks with the same hash value. If the service finds any, it compares the object, part, or chunk content. If the content is the same, the service merges the object, part, or chunk data but still maintains the required number of copies of the data that’s specified in the service plan for the namespace that contains the object, part, or chunk.

The metadata for each merged object, part, or chunk points to the merged object, part, or chunk data. The Duplication Elimination service never deletes any of the metadata for duplicate objects, parts, or chunks.

The next figure shows duplicate elimination for two objects with the same content where the DPL is two.

These considerations apply:

The Duplicate Elimination service does not merge objects, parts, and chunks smaller than seven KB.

The Duplicate Elimination service does not merge the data for chunks with the data for objects and parts that are not erasure coded.

If the Duplicate Elimination service merges the data for a whole object that is subject to erasure coding and then merges the data for applicable chunk after the object is erasure coded, only the merge of the whole object data is included in the duplicate elimination statistics.

The Duplicate Elimination service does not merge data that is stored on extended storage.

For objects, parts, and chunks stored on primary running storage, the Duplicate Elimination service generally merges objects, parts, and chunks from different namespaces only if the namespaces have the same ingest tier DPL.

For objects, parts, and chunks stored on primary spindown storage, the Duplicate Elimination service generally merges objects, parts, and chunks from different namespaces only if the namespaces have the same primary spindown storage tier DPL.

For the purpose of duplicate elimination, HCP considers an object, part, or chunk stored on extended storage to have a DPL that is one less than the ingest tier DPL that’s specified in the service plan for the namespace that contains the object, part, or chunk. So, for example, the Duplicate Elimination service will merge objects, parts, and chunks stored on primary running storage in a namespace that has an ingest tier DPL of 1 with objects stored on extended storage in a namespace that has an ingest tier DPL of 2.

The Duplicate Elimination service may bypass merging certain objects until it reprocesses them. This can happen with:

oObjects stored with CIFS or NFS that are still open due to lazy close

oObjects stored with CIFS or NFS that do not immediately have MD5 hash values

© 2015, 2020 Hitachi Vantara LLC. All rights reserved.