HBase
  1. HBase
  2. HBASE-5161

Compaction algorithm should prioritize reference files

    Details

    • Type: Bug Bug
    • Status: Reopened
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: 0.92.0
    • Fix Version/s: 0.92.1
    • Component/s: None
    • Labels:
      None

      Description

      I got myself into a state where my table was un-splittable as long as the insert load was coming in. Emergency flushes because of the low memory barrier don't check the number of store files so it never blocks, to a point where I had in one case 45 store files and the compactions were almost never done on the reference files (had 15 of them, went down by one in 20 minutes). Since you can't split regions with reference files, that region couldn't split and was doomed to just get more store files until the load stopped.

      Marking this as a minor issue, what we really need is a better pushback mechanism but not prioritizing reference files seems wrong.

        Activity

        Hide
        Lars Hofhansl added a comment -

        This is an old existing issue with no patch available. Unassigning from 0.94 for now.

        Show
        Lars Hofhansl added a comment - This is an old existing issue with no patch available. Unassigning from 0.94 for now.
        Hide
        Jieshan Bean added a comment -

        It's a really special scenario. High input pressure and splitting run in parallel caused this vicious circle.
        We should check whether the compaction is still running. Compact a 400G region will take long time.

        Show
        Jieshan Bean added a comment - It's a really special scenario. High input pressure and splitting run in parallel caused this vicious circle. We should check whether the compaction is still running. Compact a 400G region will take long time.
        Hide
        ramkrishna.s.vasudevan added a comment - - edited

        I did not want to open a new issue. Thought i can reopen this issue. Correct me if am wrong.

        I have a high load write operation going on. Region split keeps happening.
        When to one of my region the load becomes too heavy the split starts and creates lot of reference files which is greater than my maxfilestocompact.

        So suppose my 'hbase.hstore.compaction.max' is 20 and the reference files that are created is 32*2 (64 files).
        In compaction selection

        if (compactSelection.getFilesToCompact().size() > this.maxFilesToCompact) {
                int pastMax =
                  compactSelection.getFilesToCompact().size() - this.maxFilesToCompact;
                compactSelection.clearSubList(0, pastMax);
              }
        

        The filesToCompact is ordered based on seq id. In this case the set of files from 0 to pastMax (i.e) the reference files which has lesser seq id are not considered for compaction. By the time more store files are created and once again the earlier created ones are avoided. Those being reference files, the split never happens. The region grew upto 400G.

        Note that - We even tried to stop the writes for 2 to 3 hours but still the compaction did not pick up the reference file.

        Show
        ramkrishna.s.vasudevan added a comment - - edited I did not want to open a new issue. Thought i can reopen this issue. Correct me if am wrong. I have a high load write operation going on. Region split keeps happening. When to one of my region the load becomes too heavy the split starts and creates lot of reference files which is greater than my maxfilestocompact. So suppose my 'hbase.hstore.compaction.max' is 20 and the reference files that are created is 32*2 (64 files). In compaction selection if (compactSelection.getFilesToCompact().size() > this .maxFilesToCompact) { int pastMax = compactSelection.getFilesToCompact().size() - this .maxFilesToCompact; compactSelection.clearSubList(0, pastMax); } The filesToCompact is ordered based on seq id. In this case the set of files from 0 to pastMax (i.e) the reference files which has lesser seq id are not considered for compaction. By the time more store files are created and once again the earlier created ones are avoided. Those being reference files, the split never happens. The region grew upto 400G. Note that - We even tried to stop the writes for 2 to 3 hours but still the compaction did not pick up the reference file.
        Hide
        ramkrishna.s.vasudevan added a comment - - edited

        @Stack and @J-D

        We seem to end up in the same problem. We had some 32 reference files created out of which one was never selected in further compaction cycles.

        We even tried to stop the writes for 2 to 3 hours but still the compaction did not pick up the reference file.

        The region grew upto 400G.

        ./hbase-root-regionserver-HOST-192-168-47-205.log:2012-04-26 09:44:15,835 DEBUG org.apache.hadoop.hbase.regionserver.Store: hdfs://10.18.40.217:9000/hbase/ufdr/ce5c144a1714df08db1132238a749116/value/cde90029ecb74ef791500ccd3a1e8908.755d1cf6b960c02cc72c1dd83551df82-hdfs://10.18.40.217:9000/hbase/ufdr/755d1cf6b960c02cc72c1dd83551df82/value/cde90029ecb74ef791500ccd3a1e8908-top is not splittable
        

        We get the above logs for almost 2 to 3 hours. The pair of this reference file (its bottom) is also not compacted.

        Will dig in more to find any other reason for not getting picked up.

        Show
        ramkrishna.s.vasudevan added a comment - - edited @Stack and @J-D We seem to end up in the same problem. We had some 32 reference files created out of which one was never selected in further compaction cycles. We even tried to stop the writes for 2 to 3 hours but still the compaction did not pick up the reference file. The region grew upto 400G. ./hbase-root-regionserver-HOST-192-168-47-205.log:2012-04-26 09:44:15,835 DEBUG org.apache.hadoop.hbase.regionserver.Store: hdfs: //10.18.40.217:9000/hbase/ufdr/ce5c144a1714df08db1132238a749116/value/cde90029ecb74ef791500ccd3a1e8908.755d1cf6b960c02cc72c1dd83551df82-hdfs://10.18.40.217:9000/hbase/ufdr/755d1cf6b960c02cc72c1dd83551df82/value/cde90029ecb74ef791500ccd3a1e8908-top is not splittable We get the above logs for almost 2 to 3 hours. The pair of this reference file (its bottom) is also not compacted. Will dig in more to find any other reason for not getting picked up.
        Hide
        Jean-Daniel Cryans added a comment -

        Right, the real solution is that we need to slow down writes if we get in that sort of situation. We can only compact so fast.

        Show
        Jean-Daniel Cryans added a comment - Right, the real solution is that we need to slow down writes if we get in that sort of situation. We can only compact so fast.
        Hide
        stack added a comment -

        This is not actually a problem, right J-D? The actual problem is that it takes a long time to clear the reference files – even though they are the first things scheduled on region open – because sometimes we have such a backlog of compaction to catch up on (lots of big files).

        Show
        stack added a comment - This is not actually a problem, right J-D? The actual problem is that it takes a long time to clear the reference files – even though they are the first things scheduled on region open – because sometimes we have such a backlog of compaction to catch up on (lots of big files).
        Hide
        Jean-Daniel Cryans added a comment -

        Just got this again running a 5TB upload, started seeing regions of >50GB that couldn't split.

        Show
        Jean-Daniel Cryans added a comment - Just got this again running a 5TB upload, started seeing regions of >50GB that couldn't split.

          People

          • Assignee:
            Unassigned
            Reporter:
            Jean-Daniel Cryans
          • Votes:
            1 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:

              Development