[HBASE-7519] Support level compaction - ASF JIRA

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Later
Affects Version/s: None
Fix Version/s: None
Component/s: Compaction
Labels:
None

Description

The level compaction algorithm may help HBase for some use cases, for example, read heavy loads (especially, just one version is used), relative small key space updated frequently.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

level-compaction.pdf
08/Jan/13 21:10
74 kB
Jimmy Xiang
level-compactions-notes.txt
18/Jan/13 19:57
10 kB
Sergey Shelukhin
level-compactions-notes.txt
18/Jan/13 03:36
7 kB
Sergey Shelukhin

Issue Links

depends upon

HBASE-7516 Make compaction policy pluggable

Closed

is related to

HBASE-7667 Support stripe compaction

Closed

relates to

HBASE-7055 port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)

Closed

HBASE-5626 Compactions simulator tool for proofing algorithms

Closed

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

Ascending order - Click to sort in descending order

Jonathan Hsieh added a comment - 08/Jan/13 20:48

Do you have some links that explain what level compaction means?

Jonathan Hsieh added a comment - 08/Jan/13 20:48 Do you have some links that explain what level compaction means?

Jimmy Xiang added a comment - 08/Jan/13 21:06

Here is some details about the leveldb level compaction:
http://leveldb.googlecode.com/svn/trunk/doc/impl.html

Cassandra supports level compaction too:

http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra

http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

Jimmy Xiang added a comment - 08/Jan/13 21:06 Here is some details about the leveldb level compaction: http://leveldb.googlecode.com/svn/trunk/doc/impl.html Cassandra supports level compaction too: http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra http://www.datastax.com/dev/blog/when-to-use-leveled-compaction

Jimmy Xiang added a comment - 08/Jan/13 21:10

Added is some notes I did in looking into level compaction performance. It could be very rough.

Jimmy Xiang added a comment - 08/Jan/13 21:10 Added is some notes I did in looking into level compaction performance. It could be very rough.

Enis Soztutar added a comment - 09/Jan/13 00:31

Linking relavant ~~HBASE-7055~~. LevelDB compactions were discussed there to some extend.

In general, I like the idea, and for some use cases, guaranteeing an upper bound on the number of store files touched on the read side will definitely help.

Enis Soztutar added a comment - 09/Jan/13 00:31 Linking relavant HBASE-7055 . LevelDB compactions were discussed there to some extend. In general, I like the idea, and for some use cases, guaranteeing an upper bound on the number of store files touched on the read side will definitely help.

Jimmy Xiang added a comment - 09/Jan/13 01:14

Assign to me for now since I am thinking how to implement it. Please let me know if someone is interested to do it too.

Jimmy Xiang added a comment - 09/Jan/13 01:14 Assign to me for now since I am thinking how to implement it. Please let me know if someone is interested to do it too.

Jimmy Xiang added a comment - 09/Jan/13 01:18

Yes, it relates to ~~HBASE-7055~~. We need to support pluggable compaction policy/algorithm (~~HBASE-7516~~) so that we can play with each one and choose the right one per the load.

It will be even great if we can dynamically tune/choose a proper one.

Jimmy Xiang added a comment - 09/Jan/13 01:18 Yes, it relates to HBASE-7055 . We need to support pluggable compaction policy/algorithm ( HBASE-7516 ) so that we can play with each one and choose the right one per the load. It will be even great if we can dynamically tune/choose a proper one.

Enis Soztutar added a comment - 09/Jan/13 02:13

It will be even great if we can dynamically tune/choose a proper one.

Sergey is doing the basic blocks of managing global/per-table/per-cf configuration in ~~HBASE-7236~~. After that, we can start to think about ~~HBASE-5678~~. However, even without the dynamic config, we should be able to tune the parameters by rolling reopen for the regions.

Enis Soztutar added a comment - 09/Jan/13 02:13 It will be even great if we can dynamically tune/choose a proper one. Sergey is doing the basic blocks of managing global/per-table/per-cf configuration in HBASE-7236 . After that, we can start to think about HBASE-5678 . However, even without the dynamic config, we should be able to tune the parameters by rolling reopen for the regions.

Sergey Shelukhin added a comment - 09/Jan/13 02:35

From the discussion today; from my understanding leveldb key ranges in different files may overlap between levels; that means that compaction for a range must do something with the leftover bits of the files, or keep old files for other ranges.
E.g. if I have two levels somewhere (in no particular order) - lN with [1,5), [6, 10] files and lM with [1, 4), [4, 7), [8, 10] files, compaction for [4, 7] must include both of the lN files, and produce some parts of them, or keep them, for the reads from other ranges. If instead it uses the largest overlapping range to avoid only using parts of the files, all ranges would eventually merge.
That would mean Compactor and other things also needs to change significantly (and become pluggable?) afaisee.

Sergey Shelukhin added a comment - 09/Jan/13 02:35 From the discussion today; from my understanding leveldb key ranges in different files may overlap between levels; that means that compaction for a range must do something with the leftover bits of the files, or keep old files for other ranges. E.g. if I have two levels somewhere (in no particular order) - lN with [1,5), [6, 10] files and lM with [1, 4), [4, 7), [8, 10] files, compaction for [4, 7] must include both of the lN files, and produce some parts of them, or keep them, for the reads from other ranges. If instead it uses the largest overlapping range to avoid only using parts of the files, all ranges would eventually merge. That would mean Compactor and other things also needs to change significantly (and become pluggable?) afaisee.

Sergey Shelukhin added a comment - 09/Jan/13 02:35

[4, 7)

Sergey Shelukhin added a comment - 09/Jan/13 02:35 [4, 7)

Jimmy Xiang added a comment - 09/Jan/13 20:10

sershe, you are right. As I mentioned in the RB for ~~HBASE-7516~~, it's not just a file selection issue. We also need to consider how to generate the new file(s). We do need some refactory, which can be done in ~~HBASE-7516~~, so that other compaction algorithm including file selection is pluggable.

Jimmy Xiang added a comment - 09/Jan/13 20:10 sershe , you are right. As I mentioned in the RB for HBASE-7516 , it's not just a file selection issue. We also need to consider how to generate the new file(s). We do need some refactory, which can be done in HBASE-7516 , so that other compaction algorithm including file selection is pluggable.

Sergey Shelukhin added a comment - 10/Jan/13 01:35

I'll take a look at it there this week...

Sergey Shelukhin added a comment - 10/Jan/13 01:35 I'll take a look at it there this week...

Sergey Shelukhin added a comment - 18/Jan/13 03:36

Attaching some design/implementation notes. I haven't yet filled in the notes about migration and about Scan, will do tomorrow.

Sergey Shelukhin added a comment - 18/Jan/13 03:36 Attaching some design/implementation notes. I haven't yet filled in the notes about migration and about Scan, will do tomorrow.

Matteo Bertozzi added a comment - 18/Jan/13 03:46

Why you need a different file structure? you just need a meta field associated with the file that says "I'm in level X". All the files are already in memory with trailer and other metadata information.

One "problem" with the LevelDB algorithm is that if the set of keys are monotonically increasing (no overlaps between files) you end up with lots and lots of files, and at region server startup the RS can run out of fds

Matteo Bertozzi added a comment - 18/Jan/13 03:46 Why you need a different file structure? you just need a meta field associated with the file that says "I'm in level X". All the files are already in memory with trailer and other metadata information. One "problem" with the LevelDB algorithm is that if the set of keys are monotonically increasing (no overlaps between files) you end up with lots and lots of files, and at region server startup the RS can run out of fds

Jimmy Xiang added a comment - 18/Jan/13 03:58

Agree with Matteo. Not sure what Sergey wants to do in ~~HBASE-7603~~.

Jimmy Xiang added a comment - 18/Jan/13 03:58 Agree with Matteo. Not sure what Sergey wants to do in HBASE-7603 .

Jimmy Xiang added a comment - 18/Jan/13 04:01

All store files are already open. Their information are already in memory. A different file structure will break lots of things since the file structure is assumed in many places.

Jimmy Xiang added a comment - 18/Jan/13 04:01 All store files are already open. Their information are already in memory. A different file structure will break lots of things since the file structure is assumed in many places.

Sergey Shelukhin added a comment - 18/Jan/13 18:11

The tl;dr in the section about metadata explains it, the strict ordering of files by seqnum is no longer possible; so if the default HBase store picks up the files and sorts them by max seqnum, it will have incorrect results for gets, and potentially other issues.

Sergey Shelukhin added a comment - 18/Jan/13 18:11 The tl;dr in the section about metadata explains it, the strict ordering of files by seqnum is no longer possible; so if the default HBase store picks up the files and sorts them by max seqnum, it will have incorrect results for gets, and potentially other issues.

Sergey Shelukhin added a comment - 18/Jan/13 18:12

Btw, files never move from level to level so specifying level can be done inside file metadata.

Sergey Shelukhin added a comment - 18/Jan/13 18:12 Btw, files never move from level to level so specifying level can be done inside file metadata.

Sergey Shelukhin added a comment - 18/Jan/13 18:26

...note also the logic changes described (key-before, split point, and other minor ones). Existing code will be invalid on those. It seems like it's easiest to achieve by encapsulating file management into a class.

Sergey Shelukhin added a comment - 18/Jan/13 18:26 ...note also the logic changes described (key-before, split point, and other minor ones). Existing code will be invalid on those. It seems like it's easiest to achieve by encapsulating file management into a class.

Sergey Shelukhin added a comment - 18/Jan/13 19:57

Updating the document on scan and migrations. The more I update the document the more I think we may actually benefit from the manifest file too, as it makes certain recovery and migration scenarios simpler by providing a sort of a "transactional" mechanism for updating file sets.

Sergey Shelukhin added a comment - 18/Jan/13 19:57 Updating the document on scan and migrations. The more I update the document the more I think we may actually benefit from the manifest file too, as it makes certain recovery and migration scenarios simpler by providing a sort of a "transactional" mechanism for updating file sets.

Enis Soztutar added a comment - 18/Jan/13 19:58

It seems like most of the Store, not just compaction, should become pluggable, because of the scanner changes as Sergey mentions in the notes.

Enis Soztutar added a comment - 18/Jan/13 19:58 It seems like most of the Store, not just compaction, should become pluggable, because of the scanner changes as Sergey mentions in the notes.

Shaohui Liu added a comment - 01/Sep/15 12:22

Any progress about this feature?
I would like to contribute some time on this issue if needed.

Shaohui Liu added a comment - 01/Sep/15 12:22 Any progress about this feature? I would like to contribute some time on this issue if needed.

Sergey Shelukhin added a comment - 01/Sep/15 17:48 - edited

It appears that this has been abandoned. Stripe compaction (which are supposed to be better for most HBase scenarios than level) never became very popular (possibly because they are difficult to configure), although some people use them. There are some subtle difficulties about doing level compactions correctly in HBase (some discussed above in comments and in the attached files), but the infrastructure for custom compactions is in place now, so if you want to contribute, feel free.

Sergey Shelukhin added a comment - 01/Sep/15 17:48 - edited It appears that this has been abandoned. Stripe compaction (which are supposed to be better for most HBase scenarios than level) never became very popular (possibly because they are difficult to configure), although some people use them. There are some subtle difficulties about doing level compactions correctly in HBase (some discussed above in comments and in the attached files), but the infrastructure for custom compactions is in place now, so if you want to contribute, feel free.

People

Assignee:: Unassigned

Reporter:: Jimmy Xiang

Votes:: 0 Vote for this issue

Watchers:: 20 Start watching this issue

Dates

Created:: 08/Jan/13 20:25

Updated:: 14/Jun/22 22:26

Resolved:: 14/Jun/22 22:26