Currently, Hadoop 2.x clients can't read or write striped files from HDFS. This affects compatibility with 3.x clusters in two ways:
- The obvious impact is that 2.x clients can't make use of the new erasure coding in feature in Hadoop 3.
- For some use cases, clients built against Hadoop 3 won't be able to use erasure coding either. This is because if they write a striped file, then clients built against Hadoop 2 won't be able to read it.
This ticket proposes backporting the client-side components of
HDFS-7285 to branch-2 for improved compatibility between 2.x clients and 3.x clusters. I believe this can be done without also backporting the changes made to the NameNodes and the DataNodes. While many lines of code would need to be backported, most of it is new code that can be copy/pasted from trunk, which simplifies the process. The existing code in DFSClient, DFSInputStream, DFSOutputStream, etc. that would need to be modified is still significant, but much smaller.