Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-14603 Über-JIRA: HDFS RBF stabilization phase II
  3. HDFS-15417

RBF: Get the datanode report from cache for federation WebHDFS operations

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.4.0
    • federation, rbf, webhdfs
    • None

    Description

      Why
      For WebHDFS CREATE, OPEN, APPEND and GETFILECHECKSUM operations, router or namenode needs to get the datanodes where the block is located, then redirect the request to one of the datanodes.

      However, this chooseDatanode action in router is much slower than namenode, which directly affects the WebHDFS operations above.

      For namenode WebHDFS, it normally takes tens of milliseconds, while router always takes more than 2 seconds.

      How
      Cache the datanode report in router RPC server. Actively refresh with a configured interval. Only get the datanode report when necessary in router.

      It is a very expense operation where all the time is spent on.

      This is only needed when we want to exclude some datanodes or find a random datanode for CREATE.

      Attachments

        Issue Links

          Activity

            People

              NickyYe Ye Ni
              NickyYe Ye Ni
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: