Choosing a protection type
The two protection types, whole-object protection and erasure-coded protection, each have their own benefits and limitations. The following sections describe factors you should consider when choosing the protection type for a namespace.
Storage efficiency
With whole-object protection, all the data for an object in a replicated namespace is stored on each system that has that namespace, except on systems where the object is on a metadata-only storage tier. This means that the total storage used for the object data is at most the size of the data times the number of systems in the replication topology. (The storage used can be less due to compression/encryption, duplicate elimination, and metadata-only storage tiers.)
For example, in a three-system replication topology in which the object is not metadata-only on any system, the storage used for a 1 MB object is, at most, 3 MB. Having the object be metadata-only on one or two of the systems increases storage efficiency but decreases the level of data protection.
With erasure-coded protection, each system in the erasure coding topology stores one data or parity chunk for any given erasure-coded object. The size of each chunk for an object is the size of the object data divided by the number of data chunks for the object. This means that the total storage used for an object in a replicated namespace is at most the size of a chunk times the total number of data and parity chunks for the object.
For example, each object in an erasure coding topology with three systems has two data chunks and one parity chunk, so the size of each chunk of a one-megabyte object is .5 MB. As a result, the total storage used for the object is, at most, 3 × 0.5 MB, which is 1.5 MB.
The greater the number of systems in an erasure coding topology, the greater the storage efficiency. The maximum number of systems that can be in an erasure coding topology is six. With six systems, the total storage used for a one-megabyte object is, at most, 1.2 megabytes.
Data availability
Whole-object protection can provide better protection from system unavailability than erasure-coded protection provides. With whole-object protection, clients can access objects in a replicated namespace if at least one of the systems where the object data is stored is available. (HCP does not store data for objects on metadata-only storage tiers.)
With erasure-coded protection, all but one of the systems in the erasure coding topology must be available for clients to access objects in a namespace that's replicated on those systems.
Disaster recovery
Both whole-object protection and erasure-coded protection provide support for disaster recovery in case of catastrophic system failure. Disaster recovery entails reprotecting objects in replicated namespaces to their prefailure protection level by recovering them to a replacement system.
With whole-object protection, if a single system fails, the object data can be recovered to the replacement system from any remaining system where the replicated objects are not metadata-only. If two or more systems fail concurrently, whole-object protection can still provide support for disaster recovery. However, as the level of protection increases, the number of systems that must store the complete object data also increases, thereby decreasing storage efficiency.
With erasure-coded protection, if a single system fails, object data can be reconstructed on the replacement system from the chunks on all the remaining systems in the erasure coding topology. If two or more systems fail concurrently, object data cannot be reconstructed and may be permanently lost.
Read performance
With whole-object protection, a client read request for a replicated object results in a single read operation when the request is issued against a system that has the object data. If the object is metadata-only on the target system, that system must retrieve the object data from another system in the replication topology, which may increase the response time.
With erasure-coded protection, a client read request for a single erasure-coded object causes the system that's servicing the request to reconstruct the object from multiple chunks. Only one of those chunks is stored on that system. Therefore, a client read request for an erasure-coded object results in multiple read operations. These operations can be concurrent or sequential, depending on the replication topology on which the erasure coding topology is based. If sequential reads are required, the response time may be longer than if all the reads are concurrent.
Rehydration with whole-object protection and restore periods with erasure-coded protection can decrease read response times.
These considerations apply to choosing a protection type for a replicated namespace when read performance is a factor:
- If objects are likely to be read frequently throughout their lifespans and read performance is more important than storage efficiency, consider using whole-object protection for the namespace.
- If objects are unlikely to be read and storage efficiency is important, consider using erasure-coded protection for the namespace.
- If objects are likely to be read only for a limited amount of time in the short term and read performance is important, consider using erasure-coded protection with an erasure coding delay that's equal to or greater than the amount of time objects will be read for.
- If objects are likely to be read frequently but only for limited periods throughout their lifespans and read performance is important, consider using erasure-coded protection with a restore period. With a restore period, after an object is read, a full copy of the object data is placed on the ingest tier for the containing namespace and is kept there for a configured amount of time.
The ingest tier for a namespace is the storage tier where objects in the namespace are stored when the object data is first written to HCP. The ingest tier is determined by the service plan currently associated with the namespace.
Network bandwidth usage
For namespaces that use whole-object protection, replication entails sending the complete data for an object to each system in the replication topology, except to systems where the object should be metadata-only. Similarly, for namespaces that use erasure-coded protection with full-copy distribution, the complete object data is sent to each system in the erasure coding topology, except to systems where the object should be metadata-only.
For namespaces that use erasure-coded protection with chunk distribution, only object chunks are sent throughout the erasure coding topology. As a result, replication with this type of protection typically requires less network bandwidth than is required for whole-object protection and for erasure-coded protection with full-copy distribution.
On the other hand, with whole-object protection, a client read request against a system that has the object data results in a single data transfer operation. With erasure-coded protection, a client read request results in at least one data transfer operation for each required chunk that's not on the target system in addition to the data transfer operation between the system servicing the request and the client. Therefore, read requests with erasure-coded protection typically use more network bandwidth than is used for read requests with whole-object protection.
Additionally, the chunks for erasure-coded objects are transferred between systems on the replication network. Because servicing read requests takes precedence over replication processing, while any replicated namespaces are experiencing a high read rate, replication may be slow.
Rehydration with whole-object protection and restore periods with erasure-coded protection can affect network bandwidth usage for read operations.
PUT copy operations
For a PUT copy operation where the source object is erasure coded, the new object must be reconstructed from the chunks for the source object. For a large object, this reconstruction can take a long time, increasing the potential for the PUT copy operation to time out.
If large objects in a replicated namespace are likely to be sources for PUT copy operations, whole-object protection may be a better choice for the namespace than erasure-coded protection.