wo types of compactions, minor and major.
Tasks that can be performed during a compaction:
1. Sorting the KeyValue from the different file
2. Checking TTL
3. Checking number of versions to keep
4. Removing KeyValues that are effected by deletes
5. Removing the deletes
For a major compaction all of the above is done, so the question is what we
should do for a minor.
The thing is that when merging the different files into one list we need to sort
the smallest entry in all the files to be able to tell which one to put in first
,which means that we already are doing most of the work. So adding 2,3 and 4
will only add 3 extra checks, but will probably give better performance in the
system in total, but needs to be tested. The deletes can't be removes for a non
major compaction, so they will be kept in the new file, but will only apply to
older files, which will speed up reading.
Today when doing the merge we open up a scanner for all the n files that we want
to merge and for every value we put into the new file we do n compares, so if
the total number of inserts are m we get something like m*n compares since we
are not using any of the information from the last compare, so we end up redoing
a lot of work.
I propose a TreeSet or a Heap structure so we get something like m*log.
After that we just need to do some testing to see how much the extra compares
will add and how big the read performance gain will be by enforcing the extra