Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26826

Backport StoreFileTracker (HBASE-26067, HBASE-26584, and others) to branch-2.5

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.5.0
    • Operability, regionserver
    • None
    • Reviewed
    • Hide
      Introduces the StoreFileTracker interface to HBase. This is a server-side interface which abstracts how a Store (column family) knows what files should be included in that Store. Previously, HBase relied on a listing the directory a Store used for storage to determine the files which should make up that Store.

      *** StoreFileTracker is EXPERIMENTAL in 2.5. Use at your own risk. ***

      After this feature, there are two implementations of StoreFileTrackers. The first (and default) implementation is listing the Store directory. The second is a new implementation which records files which belong to a Store within each Store. Whenever the list of files that make up a Store change, this metadata file will be updated.

      This feature is notable in that it better enables HBase to function on storage systems which do not provide the typical posix filesystem semantics, most importantly, those which do not implement a file rename operation which is atomic. Storage systems which do not implement atomic renames often implement a rename as a copy and delete operation which amplifies the I/O costs by 2x.

      At scale, this feature should have a 2x reduction in I/O costs when using storage systems that do not provide atomic renames, most importantly in HBase compactions and memstore flushes. See the corresponding section, "Store File Tracking", in the HBase book for more information on how to use this feature.

      The file based StoreFileTracker, FileBasedStoreFileTracker, is currently incompatible with the Medium Objects (MOB) feature. Do not enable them together.
      Show
      Introduces the StoreFileTracker interface to HBase. This is a server-side interface which abstracts how a Store (column family) knows what files should be included in that Store. Previously, HBase relied on a listing the directory a Store used for storage to determine the files which should make up that Store. *** StoreFileTracker is EXPERIMENTAL in 2.5. Use at your own risk. *** After this feature, there are two implementations of StoreFileTrackers. The first (and default) implementation is listing the Store directory. The second is a new implementation which records files which belong to a Store within each Store. Whenever the list of files that make up a Store change, this metadata file will be updated. This feature is notable in that it better enables HBase to function on storage systems which do not provide the typical posix filesystem semantics, most importantly, those which do not implement a file rename operation which is atomic. Storage systems which do not implement atomic renames often implement a rename as a copy and delete operation which amplifies the I/O costs by 2x. At scale, this feature should have a 2x reduction in I/O costs when using storage systems that do not provide atomic renames, most importantly in HBase compactions and memstore flushes. See the corresponding section, "Store File Tracking", in the HBase book for more information on how to use this feature. The file based StoreFileTracker, FileBasedStoreFileTracker, is currently incompatible with the Medium Objects (MOB) feature. Do not enable them together.

    Description

      In a discussion on dev@ the idea was floated that StoreFileTracker could be backported into branch-2.5 to be released as part of 2.5.0 as an experimental feature. This issue considers the backport.

      There are sixteen subtasks on HBASE-26067 and several other tangential commits. Cherry pick the list in sequence, fixing up as necessary. These appear to be the core commits:

      • commit 6aaef8978 HBASE-26064 Introduce a StoreFileTracker to abstract the store file tracking logic
      • commit 43b40e937 HBASE-25988 Store the store file list by a file (#3578)
      • commit 6e053765e HBASE-26079 Use StoreFileTracker when splitting and merging (#3617)
      • commit 090b2fecf HBASE-26224 Introduce a MigrationStoreFileTracker to support migrating from different store file tracker implementations (#3656)
      • commit 0ee168933 HBASE-26246 Persist the StoreFileTracker configurations to TableDescriptor when creating table (#3666)
      • commit 2052e80e5 HBASE-26248 Should find a suitable way to let users specify the store file tracker implementation (#3665)
      • commit 5ff0f98a5 HBASE-26264 Add more checks to prevent misconfiguration on store file tracker (#3681)
      • commit fc4f6d10e HBASE-26280 Use store file tracker when snapshoting (#3685)
      • commit 06db852aa HBASE-26326 CreateTableProcedure fails when FileBasedStoreFileTracker… (#3721)
      • commit e4e7cf80b HBASE-26386 Refactor StoreFileTracker implementations to expose the set method (#3774)
      • commit 08d117197 HBASE-26328 Clone snapshot doesn't load reference files into FILE SFT impl (#3749)
      • commit 8bec26ea9 HBASE-26263 [Rolling Upgrading] Persist the StoreFileTracker configurations to TableDescriptor for existing tables (#3700)
      • commit a288365f9 HBASE-26271 Cleanup the broken store files under data directory (#3786)
      • commit d00b5faad HBASE-26454 CreateTableProcedure still relies on temp dir and renames… (#3845)
      • commit 771e552cf HBASE-26286: Add support for specifying store file tracker when restoring or cloning snapshot
      • commit f16b7b1bf HBASE-26265 Update ref guide to mention the new store file tracker im… (#3942)

      And from HBASE-26584 and beyond:

      • commit 755b3b4cb HBASE-26585 Add SFT configuration to META table descriptor when creating META (#3998)
      • commit 39c42c7dc HBASE-26639 The implementation of TestMergesSplitsAddToTracker is problematic (#4010)
      • commit 6e1f5b7fe HBASE-26586 Should not rely on the global config when setting SFT implementation for a table while upgrading (#4006)
      • commit f1dd865c3 HBASE-26654 ModifyTableDescriptorProcedure shoud load TableDescriptor while executing (#4034)
      • commit 8fbc9a260 HBASE-26674 Should modify filesCompacting under storeWriteLock (#4040)
      • commit 5aa0fd265 HBASE-26675 Data race on Compactor.writer (#4035)
      • commit 3021c5851 HBASE-26700 The way we bypass broken track file is not enough in StoreFileListFile (#4055)
      • commit a8b68c9b8 HBASE-26690 Modify FSTableDescriptors to not rely on renaming when writing TableDescriptor (#4054)
      • commit dffeb8e63 HBASE-26587 Introduce a new Admin API to change SFT implementation (#4030) (#4080)
      • commit b265fe55b HBASE-26673 Implement a shell command for change SFT implementation (#4113)
      • commit 4cdb380cc HBASE-26640 Reimplement master local region initialization to better work with SFT (#4111)
      • commit 77bb153a2 HBASE-26707: Reduce number of renames during bulkload (#4066) (#4122)
      • commit a4b192e33 HBASE-26611 Changing SFT implementation on disabled table is dangerous (#4082)
      • commit d3629bbf1 HBASE-26837 Set SFT config when creating TableDescriptor in TestCloneSnapshotProcedure (#4226)

      Attachments

        Issue Links

          Activity

            People

              apurtell Andrew Kyle Purtell
              apurtell Andrew Kyle Purtell
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: