Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Erasure coding (EC) is a hadoop-3 feature which can drastically reduce storage requirements, at the expense of locality. At my company we have a few hbase clusters which are extremely data dense and take mostly write traffic, fewer reads (cold data). We'd like to reduce the cost of these clusters, and EC is a great way to do that since it can reduce replication related storage costs by 50%.
It's possible to enable EC policies on sub directories of HDFS. One can manually set this with hdfs ec -setPolicy -path /hbase/data/default/usertable -policy xxxx. This can work without any hbase support.
One problem with that is a lack of visibility by operators into which tables might have EC enabled. I think this is where HBase can help. Here's my proposal:
- Add a new TableDescriptor and ColumnDescriptor field ERASURE_CODING_POLICY
- In ModifyTableProcedure preflightChecks, if ERASURE_CODING_POLICY is set, verify that the requested policy is available and enabled via DistributedFileSystem.
getErasureCodingPolicies(). - During ModifyTableProcedure, add a new state for MODIFY_TABLE_SYNC_ERASURE_CODING_POLICY.
- When adding or changing a policy, use DistributedFileSystem.
setErasureCodingPolicy to sync it for the data and archive dir of that table (or column in table) - When removing the property or setting it to empty, use DistributedFileSystem.
unsetErasureCodingPolicy to remove it from the data and archive dir.
- When adding or changing a policy, use DistributedFileSystem.
Since this new API is in hadoop-3 only, we'll need to add a reflection wrapper class for managing the calls and verifying that the API is available. We'll similarly do that API check in preflightChecks.
Attachments
Issue Links
- is related to
-
HBASE-28275 [Flaky test] Fix 'test_list_decommissioned_regionservers' in TestAdminShell2.java
- Resolved
- links to