[HDFS-16875] Erasure Coding: data access proxy to allow old clients to read EC data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: ec, erasure-coding
Labels:
None

Description

Erasure Coding is only supported by Hadoop 3, while many production deployments still depend on Hadoop 2. Upgrading the whole data tech stack to the Hadoop 3 release may involve big migration efforts and even reliability risks, considering the incompatibilities between these two Hadoop major releases as well as the potential uncovered issues and risks hidden in newer releases. Therefore, we need to find a solution, with the least amount of migration effort and risk, to adopt Erasure Coding for cost efficiency but still allow HDFS clients with old versions (Hadoop 2.x) to access EC data in a transparent manner.

Internally we have developed an EC access proxy which translates the EC data for old clients. We also extend the NameNode RPC so it can recognize HDFS clients with/without the EC support, and redirect the old clients to the proxy. With the proxy we set up separate Erasure Coding clusters storing hundreds of PB of data, while leaving other production clusters and all the upper layer applications untouched.

Considering some changes are made at fundamental components of HDFS (e.g., client-NN RPC header), we do not aim to merge the change to trunk. We will use this ticket to share the design and implementation details (including the code) and collect feedback. We may use a separate github repo to open source the implementation later.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Erasure Coding Access Proxy.pdf
03/Jan/23 23:38
130 kB
Jing Zhao

Activity

People

Assignee:: Jing Zhao

Reporter:: Jing Zhao

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 19/Dec/22 22:57

Updated:: 03/Jan/23 23:39