Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-3721

Speedup LoadIncrementalHFiles

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.92.0
    • util
    • None
    • Reviewed

    Description

      From Adam Phelps:
      from the logs it looks like <1% of the hfiles we're loading have to be split. Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually thinking our problem is that this code loads the hfiles sequentially. Our largest table has over 2500 regions and the data being loaded is fairly well distributed across them, so there end up being around 2500 HFiles for each load period. At 1-2 seconds per HFile that means the loading process is very time consuming.

      Currently server.bulkLoadHFile() is a blocking call.
      We can utilize ExecutorService to achieve better parallelism on multi-core computer.

      New configuration parameter "hbase.loadincremental.threads.max" is introduced which sets the maximum number of threads for parallel bulk load.

      Attachments

        1. LoadIncrementalHFiles.java
          14 kB
          Ted Yu
        2. 3721-v6.patch
          11 kB
          Ted Yu
        3. 3721-v4.txt
          16 kB
          Ted Yu
        4. 3721-v3.txt
          15 kB
          Ted Yu
        5. 3721-v2.txt
          15 kB
          Ted Yu
        6. 3721.txt
          8 kB
          Ted Yu

        Activity

          People

            yuzhihong@gmail.com Ted Yu
            yuzhihong@gmail.com Ted Yu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: