Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9412

getBlocks occupies FSLock and takes too long to complete

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Skip blocks with size below dfs.balancer.getBlocks.min-block-size (default 10MB) when a balancer asks for a list of blocks.

      Description

      getBlocks in NameNodeRpcServer acquires a read lock then may take a long time to complete (probably several seconds, if number of blocks are too much).
      During this period, other threads attempting to acquire write lock will wait.
      In an extreme case, RPC handlers are occupied by one reader thread calling getBlocks and all other threads waiting for write lock, rpc server acts like hung. Unfortunately, this tends to happen in heavy loaded cluster, since read operations come and go fast (they do not need to wait), leaving write operations waiting.

      Looks like we can optimize this thing like DN block report did in past, by splitting the operation into smaller sub operations, and let other threads do their work between each sub operation. The whole result is returned at once, though (one thing different from DN block report).
      I am not sure whether this will work. Any better idea?

        Attachments

        1. HDFS-9412.0000.patch
          2 kB
          He Tianyi
        2. HDFS-9412.0001.patch
          2 kB
          He Tianyi
        3. HDFS-9412.0002.patch
          4 kB
          He Tianyi
        4. HDFS-9412-branch-2.7.00.patch
          4 kB
          Konstantin Shvachko

          Issue Links

            Activity

              People

              • Assignee:
                He Tianyi He Tianyi
                Reporter:
                He Tianyi He Tianyi
              • Votes:
                0 Vote for this issue
                Watchers:
                17 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: