Details
Description
HDFS Caching offers performance benefits. However, currently NameNode does not treat cached replica with higher priority, so HDFS caching is only useful when cache replication = 3, that is to say, all replicas are cached in memory, so that a client doesn't randomly pick an uncached replica.
HDFS-6846 proposed to let NameNode give higher priority to cached replica. Changing a logic in NameNode is always tricky so that didn't get much traction. Here I propose a different approach: let client (DFSInputStream) prefer cached replica.
A LocatedBlock object already contains cached replica location so a client has the needed information. I think we can change DFSInputStream#getBestNodeDNAddrPair() for this purpose.