Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26067

Change the way on how we track store file list

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.6.0, 3.0.0-alpha-3
    • HFile
    • None
    • Reviewed
    • Hide
      Introduces the StoreFileTracker interface to HBase. This is a server-side interface which abstracts how a Store (column family) knows what files should be included in that Store. Previously, HBase relied on a listing the directory a Store used for storage to determine the files which should make up that Store.

      After this feature, there are two implementations of StoreFileTrackers. The first (and default) implementation is listing the Store directory. The second is a new implementation which records files which belong to a Store within each Store. Whenever the list of files that make up a Store change, this metadata file will be updated.

      This feature is notable in that it better enables HBase to function on storage systems which do not provide the typical posix filesystem semantics, most importantly, those which do not implement a file rename operation which is atomic. Storage systems which do not implement atomic renames often implement a rename as a copy and delete operation which amplifies the I/O costs by 2x.

      At scale, this feature should have a 2x reduction in I/O costs when using storage systems that do not provide atomic renames, most importantly in HBase compactions and memstore flushes. See the corresponding section, "Store File Tracking", in the HBase book for more information on how to use this feature.
      Show
      Introduces the StoreFileTracker interface to HBase. This is a server-side interface which abstracts how a Store (column family) knows what files should be included in that Store. Previously, HBase relied on a listing the directory a Store used for storage to determine the files which should make up that Store. After this feature, there are two implementations of StoreFileTrackers. The first (and default) implementation is listing the Store directory. The second is a new implementation which records files which belong to a Store within each Store. Whenever the list of files that make up a Store change, this metadata file will be updated. This feature is notable in that it better enables HBase to function on storage systems which do not provide the typical posix filesystem semantics, most importantly, those which do not implement a file rename operation which is atomic. Storage systems which do not implement atomic renames often implement a rename as a copy and delete operation which amplifies the I/O costs by 2x. At scale, this feature should have a 2x reduction in I/O costs when using storage systems that do not provide atomic renames, most importantly in HBase compactions and memstore flushes. See the corresponding section, "Store File Tracking", in the HBase book for more information on how to use this feature.

    Description

      Open a separated jira to track the work since it can not be fully included in HBASE-24749.

      I think this could be a landed prior to HBASE-24749, as if this works, we could have different implementations for tracking store file list.

      Attachments

        Issue Links

          1.
          Introduce a StoreFileTracker to abstract the store file tracking logic Sub-task Resolved Duo Zhang
          2.
          Store the store file list by a file Sub-task Resolved Duo Zhang
          3.
          Use StoreFileTracker when splitting and merging Sub-task Resolved Wellington Chevreuil
          4.
          Introduce a MigrationStoreFileTracker to support migrating from different store file tracker implementations Sub-task Resolved Duo Zhang
          5.
          Persist the StoreFileTracker configurations to TableDescriptor when creating table Sub-task Resolved Wellington Chevreuil
          6.
          Should find a suitable way to let users specify the store file tracker implementation Sub-task Resolved Duo Zhang
          7.
          [Rolling Upgrading] Persist the StoreFileTracker configurations to TableDescriptor for existing tables Sub-task Resolved Zhuoyue Huang
          8.
          Add more checks to prevent misconfiguration on store file tracker Sub-task Resolved Duo Zhang
          9.
          Update ref guide to mention the new store file tracker implementations Sub-task Resolved Wellington Chevreuil
          10.
          Cleanup the broken store files under data directory Sub-task Resolved Szabolcs Bukros
          11.
          Use store file tracker when snapshoting Sub-task Resolved Duo Zhang
          12.
          Add support for specifying store file tracker when restoring or cloning snapshot Sub-task Resolved Szabolcs Bukros
          13.
          CreateTableProcedure fails when FileBasedStoreFileTracker is set in global config Sub-task Resolved Wellington Chevreuil
          14.
          Clone snapshot doesn't load reference files into FILE SFT impl Sub-task Resolved Wellington Chevreuil
          15.
          Refactor StoreFileTracker implementations to expose the set method Sub-task Resolved Duo Zhang
          16.
          CreateTableProcedure still relies on temp dir and renames when creating table FS layout Sub-task Resolved Wellington Chevreuil

          Activity

            People

              zhangduo Duo Zhang
              zhangduo Duo Zhang
              Votes:
              1 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: