Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14283

DFSInputStream to prefer cached replica

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.6.0
    • 3.3.1, 3.4.0
    • hdfs-client
    • None
    • HDFS Caching

    Description

      HDFS Caching offers performance benefits. However, currently NameNode does not treat cached replica with higher priority, so HDFS caching is only useful when cache replication = 3, that is to say, all replicas are cached in memory, so that a client doesn't randomly pick an uncached replica.

      HDFS-6846 proposed to let NameNode give higher priority to cached replica. Changing a logic in NameNode is always tricky so that didn't get much traction. Here I propose a different approach: let client (DFSInputStream) prefer cached replica.

      A LocatedBlock object already contains cached replica location so a client has the needed information. I think we can change DFSInputStream#getBestNodeDNAddrPair() for this purpose.

      Attachments

        1. HDFS-14283.001.patch
          2 kB
          Lisheng Sun
        2. HDFS-14283.002.patch
          6 kB
          Lisheng Sun
        3. HDFS-14283.003.patch
          11 kB
          Lisheng Sun
        4. HDFS-14283.004.patch
          17 kB
          Lisheng Sun
        5. HDFS-14283.005.patch
          7 kB
          Lisheng Sun
        6. HDFS-14283.006.patch
          11 kB
          Lisheng Sun
        7. HDFS-14283.007.patch
          11 kB
          Lisheng Sun
        8. HDFS-14283.008.patch
          12 kB
          Lisheng Sun
        9. HDFS-14283.009.patch
          11 kB
          Lisheng Sun

        Activity

          People

            leosun08 Lisheng Sun
            weichiu Wei-Chiu Chuang
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: