Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14439

New/Improved Filesystem Abstractions

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Critical
    • Resolution: Implemented
    • None
    • None
    • None
    • None

    Description

      Ticket for work in progress on new FileSystem abstractions. Previously, we (Yahoo) submitted a ticket that would add support for humongous (1 million region+) tables via a hierarchical layout (HBASE-13991). However open source is moving in a similar but not identical direction in the future and so the patch will not be merged into open source.

      We will be working on a different patch now with folks from open source. It will create/add to 2 layers-- a path abstraction layer and a use-oriented abstraction layer. The path abstraction layer is epitomized by classes like FsUtils (and in the patch new classes like AFsLayout). The use oriented abstraction layer is epitomized by existing classes like MasterFileSystem/HRegionFileSystem (and possibly new classes later) that build on the path abstraction layer and focus on 'doing things' (eg creating regions) and less on the gritty details like the paths.

      This work on abstracting and isolating the paths from the use cases will help Yahoo not diverge too much from open source with its internal 'Humongous' table hierarchical layout, while also helping open source move further towards the eventual goal of redoing the FS layout in a similar (but different) hierarchical layout later that focuses on data directory uniformity (unlike the humongous patch) and storing hierarchy in the meta table instead which enables new optimizations (see HBASE-14090.)

      Attached to this ticket is some work we've done at Yahoo so far that will be put into an open source HBase branch for further collaboration. The patch is not meant to be complete yet and is a work in progress. (Please wait on patch comments/reviews.) It also includes some Yahoo-specific 'humongous' layout code that will be removed before submission in open source.

      Attachments

        1. abstraction.patch
          262 kB
          Ben Lau

        Issue Links

          1.
          update MasterStorage / RegionStorage to have a exists-in-storage check and archive methods Sub-task Closed Umesh Agashe
          2.
          Remove directory layout/ filesystem references from CompactionTool Sub-task Closed Umesh Agashe
          3.
          comment out broken test-compile references Sub-task Closed Umesh Agashe
          4.
          Remove directory layout/ filesystem references from the code in master/procedure directory Sub-task Closed Umesh Agashe
          5.
          Remove directory layout/ filesystem references from Master Sub-task Closed Unassigned
          6.
          Add ThreadPool in Legacy implementations of MasterStorage/ RegionStorage Sub-task Closed Unassigned
          7.
          Remove directory layout / fs references from HBase IO package Sub-task Closed Unassigned
          8.
          remove directory layout / fs references from TableSnapshotScanner Sub-task Closed Unassigned
          9.
          remove direct layout/fs references from mapreduce utilities Sub-task Closed Unassigned
          10.
          remove directory layout / fs references from MOB Sub-task Closed Unassigned
          11.
          remove directory layout / fs references from compaction Sub-task Closed Unassigned
          12.
          decouple Replication from backing files of WAL Sub-task Closed Unassigned
          13.
          remove directory layout / fs references from bulkload code Sub-task Closed Unassigned
          14.
          Remove directory layout/ filesystem references from hbck tool Sub-task Closed Unassigned
          15.
          ensure WAL code no longer presumes colocation with region storage Sub-task Closed Unassigned
          16.
          remove directory layout / fs references from snapshots Sub-task Closed Umesh Agashe
          17.
          fold FSUtil classes into fs integration package Sub-task Closed Unassigned
          18.
          ensure split operation doesn't directly reference fs / legacy integrations Sub-task Closed Unassigned
          19.
          ensure merge operations don't reference filesystem or legacy implementation directly Sub-task Closed Unassigned
          20.
          TEST: update HBaseTestingUtility to avoid direct use of filesystem / legacy implementation Sub-task Closed Apekshit Sharma
          21.
          TEST: update integration tests to use MasterStorage/RegionStorage Sub-task Closed Unassigned
          22.
          TEST: Remove directory layout/ filesystem references form hbck unit tests Sub-task Closed Unassigned
          23.
          TEST: Remove directory layout/ filesystem references form unit tests for master/procedure Sub-task Closed Unassigned
          24.
          TEST: update ScanPerformanceEvaluation to use MasterStorage / RegionStorage Sub-task Closed Unassigned
          25.
          TEST: Remove directory layout/ filesystem references form unit tests for master Sub-task Closed Unassigned
          26.
          TEST: update snapshot related tests to rely on MasterStorage / RegionStorage Sub-task Closed Unassigned
          27.
          TEST: update mapreduce tests to use masterstorage / regionstorage Sub-task Closed Unassigned
          28.
          TEST: update MOB unit tests to user MasterStorage/ RegionStorage APIs Sub-task Closed Unassigned
          29.
          TEST: update archiving tests to use masterstorage/regionstorage Sub-task Closed Unassigned
          30.
          TEST: update split tests Sub-task Closed Unassigned
          31.
          TEST: update merge tests Sub-task Closed Unassigned
          32.
          TEST: update compaction tests Sub-task Closed Unassigned
          33.
          TEST: update unit tests for io package Sub-task Closed Unassigned
          34.
          TEST: update tests for misc regionserver tests Sub-task Closed Unassigned
          35.
          TEST: cleanup misc tests to ensure no direct filesystem use Sub-task Closed Unassigned
          36.
          Remove directory layout/ filesystem references from Cleaners and a few other modules in master Sub-task Closed Umesh Agashe
          37.
          Refactor ExportSnapshot, SnapshotInfo and remove FS references from it Sub-task Closed Unassigned
          38.
          Move FileLink and HFileLink classes to fs.legacy package Sub-task Closed Umesh Agashe

          Activity

            People

              Unassigned Unassigned
              benlau Ben Lau
              Votes:
              0 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: