Storage usage optimization

HCP uses a number of features to reclaim and balance storage capacity.

  • Compression/Encryption service

    The Compression/Encryption service makes more efficient use of HCP storage by compressing object data, thereby freeing space for storing more objects.

  • Duplicate Elimination service

    A repository can contain multiple objects that have identical data but different metadata. When the Duplicate Elimination service finds such objects, it merges their data to free storage space occupied by all but one of the objects.

  • Disposition service

    The Disposition service automatically deletes objects with expired retention periods. To be eligible for disposition, an object must have a retention setting that’s either a date in the past or a retention class with automatic deletion enabled and a calculated expiration date in the past.

  • Version pruning

    An HCP namespace can be configured to allow storage of multiple versions of objects. Version pruning is the automatic deletion of previous versions of an object that are older than a specified amount of time.

  • Garbage Collection service

    The Garbage Collection service reclaims storage space both by completing logical delete operations and by deleting objects left behind by incomplete transactions.

  • Capacity Balancing service

    The Capacity Balancing service ensures that the percentage of space used is roughly equivalent across all the storage nodes in the system. Balancing storage usage across the nodes helps HCP balance the processing load.

  • Service plans

    Specifies the types of storage on which copies of that object must be stored and specifies the number of object copies that must be stored on each type of storage.

    By default, throughout the lifecycle of an object, HCP stores that object only on primary running storage, which is storage that’s managed by the nodes in the HCP system and consists of continuously spinning disks. However, you can configure HCP to use other types of storage for tiering purposes.

    Every service plan defines primary running storage as the initial storage tier, called the ingest tier. The default storage tiering strategy specifies only that tier.

    Primary running storage is designed to provide both high data availability and high performance for object data storage and retrieval operations. To optimize data storage price/performance for the objects in a namespace, you can configure the service plan for that namespace to define a storage tiering strategy that specifies multiple storage tiers.

  • Storage Tiering service

    HCP uses the Storage Tiering service to maintain the correct number of copies of each object in a namespace on the storage tiers that are defined by the storage tiering strategy for that namespace. When the number of object copies on a storage tier goes below the number of object copies specified for that tier in the applicable service plan, the Storage Tiering service automatically creates a new copy of that object on that tier. When the number of copies of an object on a storage tier goes above the number of object copies specified for that tier in the applicable service plan, the Storage Tiering service automatically deletes all unnecessary copies of that object from that tier.

  • Primary spindown storage

    On a SAIN system, HCP can be configured to use primary spindown storage, which is primary storage that consists of disks that can be spun down when not being accessed, for tiering purposes. You can then configure the service plan for any given namespace to define primary spindown storage as a storage tier for the objects in that namespace. Using primary spindown storage to store object data that’s accessed infrequently saves energy, thereby reducing the cost of storage.

    HCP moves object data between primary running storage, primary spindown storage, and other types of storage that are used for tiering purposes according to rules that are specified in storage tiering strategies defined by service plans.

  • S Series storage

    HCP can be configured to use S Series storage, which is storage on external HCP S Series Nodes that are separate from the HCP system. S Series Nodes are used for tiering purposes, and the HCP system communicates with them through the S3 compatible API and management API.

  • Extended storage

    HCP can be configured to use extended storage, which is storage that’s managed by devices outside of the HCP system, for tiering purposes. HCP supports the following types of extended storage:

    • NFS

      Volumes that are stored on extended storage devices and are accessed using NFS mount points

    • Amazon S3

      Cloud storage that is accessed using an Amazon Web Services user account

    • Google Cloud

      Cloud storage that is accessed using a Google Cloud Platform user account

    • Microsoft Azure

      Cloud storage that is accessed using a Microsoft Azure user account

    • S3 compatible

      Any physical storage device or cloud storage service accessed using a protocol that is compatible with the Amazon S3 API

    • ThinkOn cloud

      S3 compatible cloud storage that is accessed using a ThinkOn cloud user account

    Moving object data from primary storage to extended storage frees up HCP system storage space so that you can ingest additional objects.

    NoteWhile all of the data for an object can be moved off primary running storage and stored only on extended storage, at least one copy of the system metadata, custom metadata, and ACL for that object must always remain on primary running storage.

    In addition, you can optimize data storage price/performance for the objects in a namespace by configuring the service plan for that namespace to define a storage tiering strategy that defines storage tiers for multiple types of extended storage.

    HCP moves object data between primary running storage, primary spindown storage (if it is used), and one or more types of extended storage according to rules specified in the storage tiering strategies defined by service plans.

  • Erasure-coded protection

    Erasure-coded protection is a method of geo-protection where the data for each object in a replicated namespace is subject to erasure coding. With erasure coding, the data is encoded and broken into multiple chunks that are then stored across multiple HCP systems. All but one chunk contains object data. The other chunk contains parity for the object data.

    With erasure-coded protection, each system stores one data or parity chunk for any given erasure-coded object. The size of each chunk for an object is the size of the object data divided by the number of data chunks for the object. This means that the total storage used for an object in a replicated namespace is at most the size of a chunk times the total number of data and parity chunks for the object. (Storage usage can be less due to compression and duplicate elimination.)

    For whole-object protection (the other method of geo-protection) to provide the same level of data protection as erasure-coded protection provides, at least two systems must each store all the data for each object in a replicated namespace. With two systems, the total storage used for each object is at most two times the size of the object data, which is greater than the total storage used when the same object is erasure coded. This is true regardless of the number of systems across which the chunks for the erasure-coded object are distributed.

    Additionally, with erasure-coded protection, the storage footprint on any individual system that stores chunks for objects is smaller than the storage footprint resulting from storing complete object data on that system.

  • Metadata-only objects

    With multiple HCP systems participating in a replication topology, you may not need to store object data in every system. A metadata-only object is one from which HCP has removed the data, leaving the system metadata, custom metadata, and ACL for the object in place. HCP makes an object metadata-only only if at least one copy of the object data exists elsewhere in the topology.

    Metadata-only objects enable some systems in a replication topology to have a smaller storage footprint than other systems, even when the same namespaces are replicated to all systems in the topology.

    HCP makes objects metadata-only according to the rules specified in service plans. If the rules change, HCP can restore data to the objects to meet the new requirements.