Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-15808

Reduce potential bulk load intermediate space usage and waste

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.2.0
    • Fix Version/s: 1.3.0, 1.2.2, 0.98.20, 2.0.0
    • Component/s: None
    • Labels:
      None

      Description

      If the bulk load input files do not match the existing region boudaries, the files will be splitted.
      In the unfornate cases where the files need to be splitted multiple times,
      the process can consume unnecessary space and can even cause out of space.

      Here is over-simplified example.

      Orinal size of input files:
      consumed space: size --> 300GB
      After a round of splits:
      consumed space: size + tmpspace1 --> 300GB + 300GB
      After another round of splits:
      consumded space: size + tmpspace1 + tmpspace2 --> 300GB + 300GB + 300GB

      ..

      Currently we don't do any cleanup in the process. At least all the intermediate tmpspace (not the last one) can be deleted in the process.

        Attachments

        1. HBASE-15808-v3.patch
          6 kB
          Jerry He
        2. HBASE-15808-v2.patch
          2 kB
          Jerry He
        3. HBASE-15808.patch
          1 kB
          Jerry He

          Activity

            People

            • Assignee:
              jinghe Jerry He
              Reporter:
              jinghe Jerry He
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: