Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14603 Über-JIRA: HDFS RBF stabilization phase II
  3. HDFS-15417

RBF: Get the datanode report from cache for federation WebHDFS operations

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.4.0
    • Component/s: federation, rbf, webhdfs
    • Labels:
      None

      Description

      Why
      For WebHDFS CREATE, OPEN, APPEND and GETFILECHECKSUM operations, router or namenode needs to get the datanodes where the block is located, then redirect the request to one of the datanodes.

      However, this chooseDatanode action in router is much slower than namenode, which directly affects the WebHDFS operations above.

      For namenode WebHDFS, it normally takes tens of milliseconds, while router always takes more than 2 seconds.

      How
      Cache the datanode report in router RPC server. Actively refresh with a configured interval. Only get the datanode report when necessary in router.

      It is a very expense operation where all the time is spent on.

      This is only needed when we want to exclude some datanodes or find a random datanode for CREATE.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                NickyYe Ye Ni
                Reporter:
                NickyYe Ye Ni
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: