Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Compaction
    • Labels:
      None

      Description

      The level compaction algorithm may help HBase for some use cases, for example, read heavy loads (especially, just one version is used), relative small key space updated frequently.

      1. level-compactions-notes.txt
        10 kB
        Sergey Shelukhin
      2. level-compactions-notes.txt
        7 kB
        Sergey Shelukhin
      3. level-compaction.pdf
        74 kB
        Jimmy Xiang

        Issue Links

          Activity

          Hide
          jmhsieh Jonathan Hsieh added a comment -

          Do you have some links that explain what level compaction means?

          Show
          jmhsieh Jonathan Hsieh added a comment - Do you have some links that explain what level compaction means?
          Show
          jxiang Jimmy Xiang added a comment - Here is some details about the leveldb level compaction: http://leveldb.googlecode.com/svn/trunk/doc/impl.html Cassandra supports level compaction too: http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra http://www.datastax.com/dev/blog/when-to-use-leveled-compaction
          Hide
          jxiang Jimmy Xiang added a comment -

          Added is some notes I did in looking into level compaction performance. It could be very rough.

          Show
          jxiang Jimmy Xiang added a comment - Added is some notes I did in looking into level compaction performance. It could be very rough.
          Hide
          enis Enis Soztutar added a comment -

          Linking relavant HBASE-7055. LevelDB compactions were discussed there to some extend.

          In general, I like the idea, and for some use cases, guaranteeing an upper bound on the number of store files touched on the read side will definitely help.

          Show
          enis Enis Soztutar added a comment - Linking relavant HBASE-7055 . LevelDB compactions were discussed there to some extend. In general, I like the idea, and for some use cases, guaranteeing an upper bound on the number of store files touched on the read side will definitely help.
          Hide
          jxiang Jimmy Xiang added a comment -

          Assign to me for now since I am thinking how to implement it. Please let me know if someone is interested to do it too.

          Show
          jxiang Jimmy Xiang added a comment - Assign to me for now since I am thinking how to implement it. Please let me know if someone is interested to do it too.
          Hide
          jxiang Jimmy Xiang added a comment -

          Yes, it relates to HBASE-7055. We need to support pluggable compaction policy/algorithm (HBASE-7516) so that we can play with each one and choose the right one per the load.

          It will be even great if we can dynamically tune/choose a proper one.

          Show
          jxiang Jimmy Xiang added a comment - Yes, it relates to HBASE-7055 . We need to support pluggable compaction policy/algorithm ( HBASE-7516 ) so that we can play with each one and choose the right one per the load. It will be even great if we can dynamically tune/choose a proper one.
          Hide
          enis Enis Soztutar added a comment -

          It will be even great if we can dynamically tune/choose a proper one.

          Sergey is doing the basic blocks of managing global/per-table/per-cf configuration in HBASE-7236. After that, we can start to think about HBASE-5678. However, even without the dynamic config, we should be able to tune the parameters by rolling reopen for the regions.

          Show
          enis Enis Soztutar added a comment - It will be even great if we can dynamically tune/choose a proper one. Sergey is doing the basic blocks of managing global/per-table/per-cf configuration in HBASE-7236 . After that, we can start to think about HBASE-5678 . However, even without the dynamic config, we should be able to tune the parameters by rolling reopen for the regions.
          Hide
          sershe Sergey Shelukhin added a comment -

          From the discussion today; from my understanding leveldb key ranges in different files may overlap between levels; that means that compaction for a range must do something with the leftover bits of the files, or keep old files for other ranges.
          E.g. if I have two levels somewhere (in no particular order) - lN with [1,5), [6, 10] files and lM with [1, 4), [4, 7), [8, 10] files, compaction for [4, 7] must include both of the lN files, and produce some parts of them, or keep them, for the reads from other ranges. If instead it uses the largest overlapping range to avoid only using parts of the files, all ranges would eventually merge.
          That would mean Compactor and other things also needs to change significantly (and become pluggable?) afaisee.

          Show
          sershe Sergey Shelukhin added a comment - From the discussion today; from my understanding leveldb key ranges in different files may overlap between levels; that means that compaction for a range must do something with the leftover bits of the files, or keep old files for other ranges. E.g. if I have two levels somewhere (in no particular order) - lN with [1,5), [6, 10] files and lM with [1, 4), [4, 7), [8, 10] files, compaction for [4, 7] must include both of the lN files, and produce some parts of them, or keep them, for the reads from other ranges. If instead it uses the largest overlapping range to avoid only using parts of the files, all ranges would eventually merge. That would mean Compactor and other things also needs to change significantly (and become pluggable?) afaisee.
          Hide
          sershe Sergey Shelukhin added a comment -

          [4, 7)

          Show
          sershe Sergey Shelukhin added a comment - [4, 7)
          Hide
          jxiang Jimmy Xiang added a comment -

          Sergey Shelukhin, you are right. As I mentioned in the RB for HBASE-7516, it's not just a file selection issue. We also need to consider how to generate the new file(s). We do need some refactory, which can be done in HBASE-7516, so that other compaction algorithm including file selection is pluggable.

          Show
          jxiang Jimmy Xiang added a comment - Sergey Shelukhin , you are right. As I mentioned in the RB for HBASE-7516 , it's not just a file selection issue. We also need to consider how to generate the new file(s). We do need some refactory, which can be done in HBASE-7516 , so that other compaction algorithm including file selection is pluggable.
          Hide
          sershe Sergey Shelukhin added a comment -

          I'll take a look at it there this week...

          Show
          sershe Sergey Shelukhin added a comment - I'll take a look at it there this week...
          Hide
          sershe Sergey Shelukhin added a comment -

          Attaching some design/implementation notes. I haven't yet filled in the notes about migration and about Scan, will do tomorrow.

          Show
          sershe Sergey Shelukhin added a comment - Attaching some design/implementation notes. I haven't yet filled in the notes about migration and about Scan, will do tomorrow.
          Hide
          mbertozzi Matteo Bertozzi added a comment -

          Why you need a different file structure? you just need a meta field associated with the file that says "I'm in level X". All the files are already in memory with trailer and other metadata information.

          One "problem" with the LevelDB algorithm is that if the set of keys are monotonically increasing (no overlaps between files) you end up with lots and lots of files, and at region server startup the RS can run out of fds

          Show
          mbertozzi Matteo Bertozzi added a comment - Why you need a different file structure? you just need a meta field associated with the file that says "I'm in level X". All the files are already in memory with trailer and other metadata information. One "problem" with the LevelDB algorithm is that if the set of keys are monotonically increasing (no overlaps between files) you end up with lots and lots of files, and at region server startup the RS can run out of fds
          Hide
          jxiang Jimmy Xiang added a comment -

          Agree with Matteo. Not sure what Sergey wants to do in HBASE-7603.

          Show
          jxiang Jimmy Xiang added a comment - Agree with Matteo. Not sure what Sergey wants to do in HBASE-7603 .
          Hide
          jxiang Jimmy Xiang added a comment -

          All store files are already open. Their information are already in memory. A different file structure will break lots of things since the file structure is assumed in many places.

          Show
          jxiang Jimmy Xiang added a comment - All store files are already open. Their information are already in memory. A different file structure will break lots of things since the file structure is assumed in many places.
          Hide
          sershe Sergey Shelukhin added a comment -

          The tl;dr in the section about metadata explains it, the strict ordering of files by seqnum is no longer possible; so if the default HBase store picks up the files and sorts them by max seqnum, it will have incorrect results for gets, and potentially other issues.

          Show
          sershe Sergey Shelukhin added a comment - The tl;dr in the section about metadata explains it, the strict ordering of files by seqnum is no longer possible; so if the default HBase store picks up the files and sorts them by max seqnum, it will have incorrect results for gets, and potentially other issues.
          Hide
          sershe Sergey Shelukhin added a comment -

          Btw, files never move from level to level so specifying level can be done inside file metadata.

          Show
          sershe Sergey Shelukhin added a comment - Btw, files never move from level to level so specifying level can be done inside file metadata.
          Hide
          sershe Sergey Shelukhin added a comment -

          ...note also the logic changes described (key-before, split point, and other minor ones). Existing code will be invalid on those. It seems like it's easiest to achieve by encapsulating file management into a class.

          Show
          sershe Sergey Shelukhin added a comment - ...note also the logic changes described (key-before, split point, and other minor ones). Existing code will be invalid on those. It seems like it's easiest to achieve by encapsulating file management into a class.
          Hide
          sershe Sergey Shelukhin added a comment -

          Updating the document on scan and migrations. The more I update the document the more I think we may actually benefit from the manifest file too, as it makes certain recovery and migration scenarios simpler by providing a sort of a "transactional" mechanism for updating file sets.

          Show
          sershe Sergey Shelukhin added a comment - Updating the document on scan and migrations. The more I update the document the more I think we may actually benefit from the manifest file too, as it makes certain recovery and migration scenarios simpler by providing a sort of a "transactional" mechanism for updating file sets.
          Hide
          enis Enis Soztutar added a comment -

          It seems like most of the Store, not just compaction, should become pluggable, because of the scanner changes as Sergey mentions in the notes.

          Show
          enis Enis Soztutar added a comment - It seems like most of the Store, not just compaction, should become pluggable, because of the scanner changes as Sergey mentions in the notes.
          Hide
          liushaohui Liu Shaohui added a comment -

          Any progress about this feature?
          I would like to contribute some time on this issue if needed.

          Show
          liushaohui Liu Shaohui added a comment - Any progress about this feature? I would like to contribute some time on this issue if needed.
          Hide
          sershe Sergey Shelukhin added a comment - - edited

          It appears that this has been abandoned. Stripe compaction (which are supposed to be better for most HBase scenarios than level) never became very popular (possibly because they are difficult to configure), although some people use them. There are some subtle difficulties about doing level compactions correctly in HBase (some discussed above in comments and in the attached files), but the infrastructure for custom compactions is in place now, so if you want to contribute, feel free.

          Show
          sershe Sergey Shelukhin added a comment - - edited It appears that this has been abandoned. Stripe compaction (which are supposed to be better for most HBase scenarios than level) never became very popular (possibly because they are difficult to configure), although some people use them. There are some subtle difficulties about doing level compactions correctly in HBase (some discussed above in comments and in the attached files), but the infrastructure for custom compactions is in place now, so if you want to contribute, feel free.

            People

            • Assignee:
              Unassigned
              Reporter:
              jxiang Jimmy Xiang
            • Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

              • Created:
                Updated:

                Development