Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Later
    • None
    • None
    • Compaction
    • None

    Description

      The level compaction algorithm may help HBase for some use cases, for example, read heavy loads (especially, just one version is used), relative small key space updated frequently.

      Attachments

        1. level-compaction.pdf
          74 kB
          Jimmy Xiang
        2. level-compactions-notes.txt
          10 kB
          Sergey Shelukhin
        3. level-compactions-notes.txt
          7 kB
          Sergey Shelukhin

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            jmhsieh Jonathan Hsieh added a comment -

            Do you have some links that explain what level compaction means?

            jmhsieh Jonathan Hsieh added a comment - Do you have some links that explain what level compaction means?
            jxiang Jimmy Xiang added a comment - Here is some details about the leveldb level compaction: http://leveldb.googlecode.com/svn/trunk/doc/impl.html Cassandra supports level compaction too: http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra http://www.datastax.com/dev/blog/when-to-use-leveled-compaction
            jxiang Jimmy Xiang added a comment -

            Added is some notes I did in looking into level compaction performance. It could be very rough.

            jxiang Jimmy Xiang added a comment - Added is some notes I did in looking into level compaction performance. It could be very rough.
            enis Enis Soztutar added a comment -

            Linking relavant HBASE-7055. LevelDB compactions were discussed there to some extend.

            In general, I like the idea, and for some use cases, guaranteeing an upper bound on the number of store files touched on the read side will definitely help.

            enis Enis Soztutar added a comment - Linking relavant HBASE-7055 . LevelDB compactions were discussed there to some extend. In general, I like the idea, and for some use cases, guaranteeing an upper bound on the number of store files touched on the read side will definitely help.
            jxiang Jimmy Xiang added a comment -

            Assign to me for now since I am thinking how to implement it. Please let me know if someone is interested to do it too.

            jxiang Jimmy Xiang added a comment - Assign to me for now since I am thinking how to implement it. Please let me know if someone is interested to do it too.
            jxiang Jimmy Xiang added a comment -

            Yes, it relates to HBASE-7055. We need to support pluggable compaction policy/algorithm (HBASE-7516) so that we can play with each one and choose the right one per the load.

            It will be even great if we can dynamically tune/choose a proper one.

            jxiang Jimmy Xiang added a comment - Yes, it relates to HBASE-7055 . We need to support pluggable compaction policy/algorithm ( HBASE-7516 ) so that we can play with each one and choose the right one per the load. It will be even great if we can dynamically tune/choose a proper one.
            enis Enis Soztutar added a comment -

            It will be even great if we can dynamically tune/choose a proper one.

            Sergey is doing the basic blocks of managing global/per-table/per-cf configuration in HBASE-7236. After that, we can start to think about HBASE-5678. However, even without the dynamic config, we should be able to tune the parameters by rolling reopen for the regions.

            enis Enis Soztutar added a comment - It will be even great if we can dynamically tune/choose a proper one. Sergey is doing the basic blocks of managing global/per-table/per-cf configuration in HBASE-7236 . After that, we can start to think about HBASE-5678 . However, even without the dynamic config, we should be able to tune the parameters by rolling reopen for the regions.

            From the discussion today; from my understanding leveldb key ranges in different files may overlap between levels; that means that compaction for a range must do something with the leftover bits of the files, or keep old files for other ranges.
            E.g. if I have two levels somewhere (in no particular order) - lN with [1,5), [6, 10] files and lM with [1, 4), [4, 7), [8, 10] files, compaction for [4, 7] must include both of the lN files, and produce some parts of them, or keep them, for the reads from other ranges. If instead it uses the largest overlapping range to avoid only using parts of the files, all ranges would eventually merge.
            That would mean Compactor and other things also needs to change significantly (and become pluggable?) afaisee.

            sershe Sergey Shelukhin added a comment - From the discussion today; from my understanding leveldb key ranges in different files may overlap between levels; that means that compaction for a range must do something with the leftover bits of the files, or keep old files for other ranges. E.g. if I have two levels somewhere (in no particular order) - lN with [1,5), [6, 10] files and lM with [1, 4), [4, 7), [8, 10] files, compaction for [4, 7] must include both of the lN files, and produce some parts of them, or keep them, for the reads from other ranges. If instead it uses the largest overlapping range to avoid only using parts of the files, all ranges would eventually merge. That would mean Compactor and other things also needs to change significantly (and become pluggable?) afaisee.

            [4, 7)

            sershe Sergey Shelukhin added a comment - [4, 7)
            jxiang Jimmy Xiang added a comment -

            sershe, you are right. As I mentioned in the RB for HBASE-7516, it's not just a file selection issue. We also need to consider how to generate the new file(s). We do need some refactory, which can be done in HBASE-7516, so that other compaction algorithm including file selection is pluggable.

            jxiang Jimmy Xiang added a comment - sershe , you are right. As I mentioned in the RB for HBASE-7516 , it's not just a file selection issue. We also need to consider how to generate the new file(s). We do need some refactory, which can be done in HBASE-7516 , so that other compaction algorithm including file selection is pluggable.

            I'll take a look at it there this week...

            sershe Sergey Shelukhin added a comment - I'll take a look at it there this week...

            Attaching some design/implementation notes. I haven't yet filled in the notes about migration and about Scan, will do tomorrow.

            sershe Sergey Shelukhin added a comment - Attaching some design/implementation notes. I haven't yet filled in the notes about migration and about Scan, will do tomorrow.

            Why you need a different file structure? you just need a meta field associated with the file that says "I'm in level X". All the files are already in memory with trailer and other metadata information.

            One "problem" with the LevelDB algorithm is that if the set of keys are monotonically increasing (no overlaps between files) you end up with lots and lots of files, and at region server startup the RS can run out of fds

            mbertozzi Matteo Bertozzi added a comment - Why you need a different file structure? you just need a meta field associated with the file that says "I'm in level X". All the files are already in memory with trailer and other metadata information. One "problem" with the LevelDB algorithm is that if the set of keys are monotonically increasing (no overlaps between files) you end up with lots and lots of files, and at region server startup the RS can run out of fds
            jxiang Jimmy Xiang added a comment -

            Agree with Matteo. Not sure what Sergey wants to do in HBASE-7603.

            jxiang Jimmy Xiang added a comment - Agree with Matteo. Not sure what Sergey wants to do in HBASE-7603 .
            jxiang Jimmy Xiang added a comment -

            All store files are already open. Their information are already in memory. A different file structure will break lots of things since the file structure is assumed in many places.

            jxiang Jimmy Xiang added a comment - All store files are already open. Their information are already in memory. A different file structure will break lots of things since the file structure is assumed in many places.

            The tl;dr in the section about metadata explains it, the strict ordering of files by seqnum is no longer possible; so if the default HBase store picks up the files and sorts them by max seqnum, it will have incorrect results for gets, and potentially other issues.

            sershe Sergey Shelukhin added a comment - The tl;dr in the section about metadata explains it, the strict ordering of files by seqnum is no longer possible; so if the default HBase store picks up the files and sorts them by max seqnum, it will have incorrect results for gets, and potentially other issues.

            Btw, files never move from level to level so specifying level can be done inside file metadata.

            sershe Sergey Shelukhin added a comment - Btw, files never move from level to level so specifying level can be done inside file metadata.

            ...note also the logic changes described (key-before, split point, and other minor ones). Existing code will be invalid on those. It seems like it's easiest to achieve by encapsulating file management into a class.

            sershe Sergey Shelukhin added a comment - ...note also the logic changes described (key-before, split point, and other minor ones). Existing code will be invalid on those. It seems like it's easiest to achieve by encapsulating file management into a class.

            Updating the document on scan and migrations. The more I update the document the more I think we may actually benefit from the manifest file too, as it makes certain recovery and migration scenarios simpler by providing a sort of a "transactional" mechanism for updating file sets.

            sershe Sergey Shelukhin added a comment - Updating the document on scan and migrations. The more I update the document the more I think we may actually benefit from the manifest file too, as it makes certain recovery and migration scenarios simpler by providing a sort of a "transactional" mechanism for updating file sets.
            enis Enis Soztutar added a comment -

            It seems like most of the Store, not just compaction, should become pluggable, because of the scanner changes as Sergey mentions in the notes.

            enis Enis Soztutar added a comment - It seems like most of the Store, not just compaction, should become pluggable, because of the scanner changes as Sergey mentions in the notes.
            liushaohui Shaohui Liu added a comment -

            Any progress about this feature?
            I would like to contribute some time on this issue if needed.

            liushaohui Shaohui Liu added a comment - Any progress about this feature? I would like to contribute some time on this issue if needed.
            sershe Sergey Shelukhin added a comment - - edited

            It appears that this has been abandoned. Stripe compaction (which are supposed to be better for most HBase scenarios than level) never became very popular (possibly because they are difficult to configure), although some people use them. There are some subtle difficulties about doing level compactions correctly in HBase (some discussed above in comments and in the attached files), but the infrastructure for custom compactions is in place now, so if you want to contribute, feel free.

            sershe Sergey Shelukhin added a comment - - edited It appears that this has been abandoned. Stripe compaction (which are supposed to be better for most HBase scenarios than level) never became very popular (possibly because they are difficult to configure), although some people use them. There are some subtle difficulties about doing level compactions correctly in HBase (some discussed above in comments and in the attached files), but the infrastructure for custom compactions is in place now, so if you want to contribute, feel free.

            People

              Unassigned Unassigned
              jxiang Jimmy Xiang
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: