HBase
  1. HBase
  2. HBASE-9943

Big linked list test fails with encoding PREFIX_TREE

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      ITBLL starts with encoding NONE. Online encoding change will try other encoding. However, if it starts with encoding PREFIX_TREE, the test fails, even with online encoding change disabled.

        Issue Links

          Activity

          Hide
          Jimmy Xiang added a comment -

          It is a problem harder than I thought. Unassigned while I continue to look into it. It could be an issue with the encoding algo (impl) itself, or an issue of split/compact while using this encoding. It's a corner case since I lost only ~60 of 300mil rows. It's not that corner either since my itbll fails every time when using prefix_tree, even without CM.

          Show
          Jimmy Xiang added a comment - It is a problem harder than I thought. Unassigned while I continue to look into it. It could be an issue with the encoding algo (impl) itself, or an issue of split/compact while using this encoding. It's a corner case since I lost only ~60 of 300mil rows. It's not that corner either since my itbll fails every time when using prefix_tree, even without CM.
          Jimmy Xiang made changes -
          Assignee Jimmy Xiang [ jxiang ]
          Hide
          Andrew Purtell added a comment -

          I was able to see a PREFIX_TREE related corruption on a single machine by running a minicluster and the equivalent of 'LoadTestTool -write 10:100:10 -read 100:10 -update 20:10 -num_keys 1000000'.

          Show
          Andrew Purtell added a comment - I was able to see a PREFIX_TREE related corruption on a single machine by running a minicluster and the equivalent of 'LoadTestTool -write 10:100:10 -read 100:10 -update 20:10 -num_keys 1000000'.
          Hide
          Matt Corgan added a comment -

          Sorry Jimmy, I haven't been able to reproduce as I don't have a test cluster. Can you think of a way to reproduce on a single machine?

          If you have any more clues about where or how it's erroring, that would be helpful.

          I'm happy to help you debug this, but if you want to get a better understanding in general, then this is the best doc from HBASE-4676: https://issues.apache.org/jira/secure/attachment/12518363/PrefixTrie_Format_v1.pdf

          Show
          Matt Corgan added a comment - Sorry Jimmy, I haven't been able to reproduce as I don't have a test cluster. Can you think of a way to reproduce on a single machine? If you have any more clues about where or how it's erroring, that would be helpful. I'm happy to help you debug this, but if you want to get a better understanding in general, then this is the best doc from HBASE-4676 : https://issues.apache.org/jira/secure/attachment/12518363/PrefixTrie_Format_v1.pdf
          Hide
          Jimmy Xiang added a comment -

          Matt Corgan, do you have some documentation on the prefix tree encoding algo? I found itbll has data loss even without CM (ie calm CM). BTW, have you reproduced it?

          Show
          Jimmy Xiang added a comment - Matt Corgan , do you have some documentation on the prefix tree encoding algo? I found itbll has data loss even without CM (ie calm CM). BTW, have you reproduced it?
          Hide
          Jimmy Xiang added a comment -

          From the command line, while you are generating more rows, can you try to do a rolling restart of the cluster a couple times?

          Show
          Jimmy Xiang added a comment - From the command line, while you are generating more rows, can you try to do a rolling restart of the cluster a couple times?
          Jimmy Xiang made changes -
          Attachment trunk-9943-rep.patch [ 12614287 ]
          Hide
          Jimmy Xiang added a comment - - edited

          Here is a patch I used to simplify the actions and can reproduce the problem. You can see that even just restarting RS can cause the issue since I disabled all CM actions1 and actions2, and region move in actions3. It is not a decoder/encoder caching issue either since I disabled it.

          Once you have a cluster, you can run itbll "hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList --monkey slowDeterministic Loop 5 12 25000000 IntegrationTestBigLinkedList 12" to reproduce it. The test won't survive many loops. It generally fails in loop 1 or 2 for me.

          Show
          Jimmy Xiang added a comment - - edited Here is a patch I used to simplify the actions and can reproduce the problem. You can see that even just restarting RS can cause the issue since I disabled all CM actions1 and actions2, and region move in actions3. It is not a decoder/encoder caching issue either since I disabled it. Once you have a cluster, you can run itbll "hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList --monkey slowDeterministic Loop 5 12 25000000 IntegrationTestBigLinkedList 12" to reproduce it. The test won't survive many loops. It generally fails in loop 1 or 2 for me.
          Hide
          Matt Corgan added a comment -

          Hi Jimmy, I tried to reproduce with the following commands, but I haven't seen anything wrong. Is it possible to reproduce with the command line? I'm also not sure what to look for in the itbll.Verify m/r output. Any tips to reproduce?

          truncate 'IntegrationTestBigLinkedList'
          
          alter 'IntegrationTestBigLinkedList', {DATA_BLOCK_ENCODING=>'PREFIX_TREE'}
          
          bin/hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Generator 1 25000000 /home/mcorgan/junk/itbll2/generate
          
          bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'IntegrationTestBigLinkedList'
          
          major_compact 'IntegrationTestBigLinkedList'
          
          bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'IntegrationTestBigLinkedList'
          
          bin/hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Verify /home/mcorgan/junk/itbll2/verify 4
          
          alter 'IntegrationTestBigLinkedList', {DATA_BLOCK_ENCODING=>'NONE'}
          
          major_compact 'IntegrationTestBigLinkedList'
          
          bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'IntegrationTestBigLinkedList'
          
          bin/hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Verify /home/mcorgan/junk/itbll2/verify2 4
          
          Show
          Matt Corgan added a comment - Hi Jimmy, I tried to reproduce with the following commands, but I haven't seen anything wrong. Is it possible to reproduce with the command line? I'm also not sure what to look for in the itbll.Verify m/r output. Any tips to reproduce? truncate 'IntegrationTestBigLinkedList' alter 'IntegrationTestBigLinkedList', {DATA_BLOCK_ENCODING=>'PREFIX_TREE'} bin/hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Generator 1 25000000 /home/mcorgan/junk/itbll2/generate bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'IntegrationTestBigLinkedList' major_compact 'IntegrationTestBigLinkedList' bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'IntegrationTestBigLinkedList' bin/hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Verify /home/mcorgan/junk/itbll2/verify 4 alter 'IntegrationTestBigLinkedList', {DATA_BLOCK_ENCODING=>'NONE'} major_compact 'IntegrationTestBigLinkedList' bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'IntegrationTestBigLinkedList' bin/hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Verify /home/mcorgan/junk/itbll2/verify2 4
          Jimmy Xiang made changes -
          Link This issue relates to HBASE-4676 [ HBASE-4676 ]
          Jimmy Xiang made changes -
          Field Original Value New Value
          Link This issue is related to HBASE-9757 [ HBASE-9757 ]
          Jimmy Xiang created issue -

            People

            • Assignee:
              Unassigned
              Reporter:
              Jimmy Xiang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development