Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-22749

Distributed MOB compactions

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha-1, 2.5.0
    • mob
    • None
    • Hide
      <!-- markdown -->
      MOB compaction is now handled in-line with per-region compaction on region
        servers

      - regions with mob data store per-hfile metadata about which mob hfiles are
        referenced
      - admin requested major compaction will also rewrite MOB files; periodic RS
        initiated major compaction will not
      - periodically a chore in the master will initiate a major compaction that
        will rewrite MOB values to ensure it happens. controlled by
        'hbase.mob.compaction.chore.period'. default is weekly
      - control how many RS the chore requests major compaction on in parallel
        with 'hbase.mob.major.compaction.region.batch.size'. default is as
        parallel as possible.
      - periodic chore in master will scan backing hfiles from regions to get the
        set of referenced mob hfiles and archive those that are no longer
        referenced. control period with 'hbase.master.mob.cleaner.period'
      - Optionally, RS that are compacting mob files can limit write
        amplification by not rewriting values from mob hfiles over a certain size
        limit. opt-in by setting 'hbase.mob.compaction.type' to 'optimized'.
        control threshold by 'hbase.mob.compactions.max.file.size'.
        default is 1GiB
      - Should smoothly integrate with existing MOB users via rolling upgrade.
        will delay old MOB file cleanup until per-region compaction has managed
        to compact each region at least once so that used mob hfile metadata can
        be gathered.

      This improvement obviates the dataloss in HBASE-22075.
      Show
      <!-- markdown --> MOB compaction is now handled in-line with per-region compaction on region   servers - regions with mob data store per-hfile metadata about which mob hfiles are   referenced - admin requested major compaction will also rewrite MOB files; periodic RS   initiated major compaction will not - periodically a chore in the master will initiate a major compaction that   will rewrite MOB values to ensure it happens. controlled by   'hbase.mob.compaction.chore.period'. default is weekly - control how many RS the chore requests major compaction on in parallel   with 'hbase.mob.major.compaction.region.batch.size'. default is as   parallel as possible. - periodic chore in master will scan backing hfiles from regions to get the   set of referenced mob hfiles and archive those that are no longer   referenced. control period with 'hbase.master.mob.cleaner.period' - Optionally, RS that are compacting mob files can limit write   amplification by not rewriting values from mob hfiles over a certain size   limit. opt-in by setting 'hbase.mob.compaction.type' to 'optimized'.   control threshold by 'hbase.mob.compactions.max.file.size'.   default is 1GiB - Should smoothly integrate with existing MOB users via rolling upgrade.   will delay old MOB file cleanup until per-region compaction has managed   to compact each region at least once so that used mob hfile metadata can   be gathered. This improvement obviates the dataloss in HBASE-22075 .

    Description

      There are several drawbacks in the original MOB 1.0 (Moderate Object Storage) implementation, which can limit the adoption of the MOB feature:

      1. MOB compactions are executed in a Master as a chore, which limits scalability because all I/O goes through a single HBase Master server.
      2. Yarn/Mapreduce framework is required to run MOB compactions in a scalable way, but this won’t work in a stand-alone HBase cluster.
      3. Two separate compactors for MOB and for regular store files and their interactions can result in a data loss (see HBASE-22075)

      The design goals for MOB 2.0 were to provide 100% MOB 1.0 - compatible implementation, which is free of the above drawbacks and can be used as a drop in replacement in existing MOB deployments. So, these are design goals of a MOB 2.0:

      1. Make MOB compactions scalable without relying on Yarn/Mapreduce framework
      2. Provide unified compactor for both MOB and regular store files
      3. Make it more robust especially w.r.t. to data losses.
      4. Simplify and reduce the overall MOB code.
      5. Provide 100% compatible implementation with MOB 1.0.
      6. No migration of data should be required between MOB 1.0 and MOB 2.0 - just software upgrade.

      Attachments

        1. HBASE-22749_nightly_unit_test_analyzer.pdf
          607 kB
          Sean Busbey
        2. HBASE-22749_nightly_Unit_Test_Results.csv
          3.71 MB
          Sean Busbey
        3. HBASE-22749-branch-2.2-v4.patch
          301 kB
          Vladimir Rodionov
        4. HBASE-22749-master-v1.patch
          301 kB
          Vladimir Rodionov
        5. HBASE-22749-master-v2.patch
          319 kB
          Vladimir Rodionov
        6. HBASE-22749-master-v3.patch
          386 kB
          Vladimir Rodionov
        7. HBASE-22749-master-v4.patch
          386 kB
          Vladimir Rodionov
        8. HBase-MOB-2.0-v3.0.pdf
          228 kB
          Vladimir Rodionov

        Issue Links

          Activity

            People

              vrodionov Vladimir Rodionov
              vrodionov Vladimir Rodionov
              Votes:
              0 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: