Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9090

Write hot data on few nodes may cause performance issue

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.3.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      (I am not sure whether this should be reported as BUG, feel free to modify this)

      Current block placement policy makes best effort to guarantee first replica on local node whenever possible.

      Consider the following scenario:
      1. There are 500 datanodes across plenty of racks,
      2. Raw user action log (just an example) are being written only on 10 nodes, which also have datanode deployed locally,
      3. Then, before any balance, all these logs will have at least one replica in 10 nodes, implying one thirds data read on these log will be served by these 10 nodes if repl factor is 3, performance suffers.

      I propose to solve this scenario by introducing a configuration entry for client to disable arbitrary level of write locality.
      Then we can either (A) add local nodes to excludedNodes, or (B) tell NameNode the locality we prefer.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                He Tianyi He Tianyi
                Reporter:
                He Tianyi He Tianyi
              • Votes:
                1 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated: