Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-16393

Improve computeHDFSBlocksDistribution

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Implemented
    • None
    • None
    • None
    • None

    Description

      With our cluster is big, i can see the balancer is slow from time to time. And the balancer will be called on master startup, so we can see the startup is slow also.
      The first thing i think whether if we can parallel compute different region's HDFSBlocksDistribution.
      The second i think we can improve compute single region's HDFSBlocksDistribution.
      When to compute a storefile's HDFSBlocksDistribution first we call FileSystem#getFileStatus(path) and then FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc call for every storefile. Instead we can use FileSystem#listLocatedStatus to get a LocatedFileStatus for the information we need, so reduce the namenode rpc call to one. This can speed the computeHDFSBlocksDistribution, but also send out less rpc call to namenode.

      Attachments

        1. HBASE-16393.patch
          5 kB
          Lijin Bin

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            binlijin Lijin Bin
            binlijin Lijin Bin
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment