Duplicate elimination service processing

HCP performs duplicate elimination by first sorting objects according to their MD5 hash values. After sorting all the objects in the repository, the service checks for objects with the same hash value. If the service finds any, it compares the object content. If the content is the same, the service merges the object data but still maintains the required number of copies of the object data that’s specified in the service plan for the namespace that contains the object.

The metadata for each merged object points to the merged object data. The duplication elimination service never deletes any of the metadata for duplicate objects.

These considerations apply:

The duplicate elimination service does not merge objects smaller than seven KB.

The duplicate elimination service does not merge data that’s stored on extended storage.

For objects that are stored on primary running storage, the duplicate elimination service generally merges objects from different namespaces only if the namespaces have the same ingest tier DPL.

For objects that are stored on primary spindown storage, the duplicate elimination service generally merges objects from different namespaces only if the namespaces have the same primary spindown storage tier DPL.

For the purpose of duplicate elimination, HCP considers an object stored on extended storage to have a DPL that’s one less than the ingest tier DPL that’s specified in the service plan for the namespace that contains the object. So, for example, the duplicate elimination service will merge objects that are stored on primary running storage in a namespace that has an ingest tier DPL of 1 with objects that are stored on extended storage in a namespace that has an ingest tier DPL of 2.

For information on ingest tier DPL, see Ingest tier data protection level.

The duplicate elimination service may bypass merging certain objects until it reprocesses the objects. This can happen with:

oObjects stored with CIFS or NFS that are still open due to lazy close

oObjects stored with CIFS or NFS that do not immediately have MD5 hash values

For information on lazy close, see Using a Namespace or Using the Default Namespace. For more information on cryptographic hash values, see Content verification service.

Trademark and LegalDisclaimer

© 2016 Hitachi Data Systems Corporation. All rights reserved.