Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-3404

Compaction Ordering for Bulk Import Files

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.90.0, 0.90.1, 0.92.0
    • None
    • None
    • None

    Description

      We got into an issue today where we were using HFileOutputFormat to perform an incremental load on an already-large cluster. Because bulk-loaded files don't have a sequence ID, they are put in the front of the StoreFile list. This resulted in the following StoreFile ordering

      2GB (bulk) => 25GB => 2GB => ...

      So this triggered a 30+GB major compaction for every single region. Optimally, we would like bulk import files to be ordered in the compaction list at the time of insertion so this can be a much smaller compaction and rely on StoreFile age for major compaction trigger.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            nspiegelberg Nicolas Spiegelberg
            nspiegelberg Nicolas Spiegelberg
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment