Erasure Coding is only supported by Hadoop 3, while many production deployments still depend on Hadoop 2. Upgrading the whole data tech stack to the Hadoop 3 release may involve big migration efforts and even reliability risks, considering the incompatibilities between these two Hadoop major releases as well as the potential uncovered issues and risks hidden in newer releases. Therefore, we need to find a solution, with the least amount of migration effort and risk, to adopt Erasure Coding for cost efficiency but still allow HDFS clients with old versions (Hadoop 2.x) to access EC data in a transparent manner.
Internally we have developed an EC access proxy which translates the EC data for old clients. We also extend the NameNode RPC so it can recognize HDFS clients with/without the EC support, and redirect the old clients to the proxy. With the proxy we set up separate Erasure Coding clusters storing hundreds of PB of data, while leaving other production clusters and all the upper layer applications untouched.
Considering some changes are made at fundamental components of HDFS (e.g., client-NN RPC header), we do not aim to merge the change to trunk. We will use this ticket to share the design and implementation details (including the code) and collect feedback. We may use a separate github repo to open source the implementation later.