Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      "SplitTransaction: Split row is not inside region key range or is equal to startkey".

      I stopped writers after realizing one region of the table was growing unbounded:

      webtable,DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF,1291557809928.
      deccb20bcbf8e634008cf093105c4fc5.
                  stores=3, storefiles=10, storefileSizeMB=6753, memstoreSizeMB=2,
       storefileIndexSizeMB=2
      

      In the regionserver log, every compaction of this region fails to split with the following message:

      2010-12-05 09:04:50,156 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: 
      Split row is not inside region key range or is equal to startkey: 
      DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF
      

      However there are many rows in the region:

      10/12/05 09:33:33 DEBUG client.HTable$ClientScanner: Advancing internal scanner
      to startKey at 'DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF'
      
      [...]
      
      Current count: 258000, row: DE424FBDBD15FF3B3E9D0C3DB149ECD29B0F615B            
      
      Current count: 259000, row: DF27251479D6C91B27AA9B1561070A53011A6D1E            
      
      10/12/05 09:33:36 DEBUG client.HTable$ClientScanner: Finished with region REGION => 
      {NAME => 'webtable,DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF,
      1291557809928.deccb20bcbf8e634008cf093105c4fc5.', 
      STARTKEY => 'DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF', 
      ENDKEY => 'DF76CF458433DB5D0CB2C50042452B296E3721A7', 
      ENCODED => deccb20bcbf8e634008cf093105c4fc5, TABLE => {{NAME => 'webtable', 
      FAMILIES => [{NAME => 'content', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', 
      VERSIONS => '2147483647', COMPRESSION => 'LZO', TTL => '2147483647', 
      BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, 
      {NAME => 'info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', 
      VERSIONS => '2147483647', COMPRESSION => 'LZO', TTL => '2147483647', 
      BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}, 
      {NAME => 'url', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', 
      VERSIONS => '1', COMPRESSION => 'LZO', TTL => '2147483647', 
      BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
      10/12/05 09:33:36 DEBUG client.HTable$ClientScanner: Advancing internal scanner
      to startKey at 'DF76CF458433DB5D0CB2C50042452B296E3721A7'
      

        Issue Links

          Activity

          Andrew Purtell created issue -
          Andrew Purtell made changes -
          Field Original Value New Value
          Description "SplitTransaction: Split row is not inside region key range or is equal to startkey".

          I stopped writers after realizing one region of the table was growing unbounded:

          {noformat}
          webtable,DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF,1291557809928.
          deccb20bcbf8e634008cf093105c4fc5.
                      stores=3, storefiles=10, storefileSizeMB=6753, memstoreSizeMB=2,
           storefileIndexSizeMB=2
          {noformat}

          In the regionserver log, every compaction of this region fails to split with the following messages:

          {noformat}
          2010-12-05 09:04:50,156 INFO org.apache.hadoop.hbase.regionserver.HRegion:
          Starting compaction on region webtable,DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF,
          1291557809928.deccb20bcbf8e634008cf093105c4fc5.
          2010-12-05 09:04:50,156 INFO org.apache.hadoop.hbase.regionserver.HRegion:
          completed compaction on region webtable,DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF,
          1291557809928.deccb20bcbf8e634008cf093105c4fc5. after 0sec
          2010-12-05 09:04:50,156 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction:
          Split row is not inside region key range or is equal to startkey:
          DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF
          {noformat}

          However there are many rows in the region:

          {noformat}
          10/12/05 09:33:33 DEBUG client.HTable$ClientScanner: Advancing internal scanner
          to startKey at 'DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF'

          [...]

          Current count: 258000, row: DE424FBDBD15FF3B3E9D0C3DB149ECD29B0F615B

          Current count: 259000, row: DF27251479D6C91B27AA9B1561070A53011A6D1E

          10/12/05 09:33:36 DEBUG client.HTable$ClientScanner: Finished with region REGION =>
          {NAME => 'webtable,DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF,
          1291557809928.deccb20bcbf8e634008cf093105c4fc5.',
          STARTKEY => 'DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF',
          ENDKEY => 'DF76CF458433DB5D0CB2C50042452B296E3721A7',
          ENCODED => deccb20bcbf8e634008cf093105c4fc5, TABLE => {{NAME => 'webtable',
          FAMILIES => [{NAME => 'content', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
          VERSIONS => '2147483647', COMPRESSION => 'LZO', TTL => '2147483647',
          BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'},
          {NAME => 'info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
          VERSIONS => '2147483647', COMPRESSION => 'LZO', TTL => '2147483647',
          BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'},
          {NAME => 'url', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
          VERSIONS => '1', COMPRESSION => 'LZO', TTL => '2147483647',
          BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
          10/12/05 09:33:36 DEBUG client.HTable$ClientScanner: Advancing internal scanner
          to startKey at 'DF76CF458433DB5D0CB2C50042452B296E3721A7'
          {noformat}
          "SplitTransaction: Split row is not inside region key range or is equal to startkey".

          I stopped writers after realizing one region of the table was growing unbounded:

          {noformat}
          webtable,DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF,1291557809928.
          deccb20bcbf8e634008cf093105c4fc5.
                      stores=3, storefiles=10, storefileSizeMB=6753, memstoreSizeMB=2,
           storefileIndexSizeMB=2
          {noformat}

          In the regionserver log, every compaction of this region fails to split with the following message:

          {noformat}
          2010-12-05 09:04:50,156 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction:
          Split row is not inside region key range or is equal to startkey:
          DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF
          {noformat}

          However there are many rows in the region:

          {noformat}
          10/12/05 09:33:33 DEBUG client.HTable$ClientScanner: Advancing internal scanner
          to startKey at 'DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF'

          [...]

          Current count: 258000, row: DE424FBDBD15FF3B3E9D0C3DB149ECD29B0F615B

          Current count: 259000, row: DF27251479D6C91B27AA9B1561070A53011A6D1E

          10/12/05 09:33:36 DEBUG client.HTable$ClientScanner: Finished with region REGION =>
          {NAME => 'webtable,DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF,
          1291557809928.deccb20bcbf8e634008cf093105c4fc5.',
          STARTKEY => 'DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF',
          ENDKEY => 'DF76CF458433DB5D0CB2C50042452B296E3721A7',
          ENCODED => deccb20bcbf8e634008cf093105c4fc5, TABLE => {{NAME => 'webtable',
          FAMILIES => [{NAME => 'content', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
          VERSIONS => '2147483647', COMPRESSION => 'LZO', TTL => '2147483647',
          BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'},
          {NAME => 'info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
          VERSIONS => '2147483647', COMPRESSION => 'LZO', TTL => '2147483647',
          BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'},
          {NAME => 'url', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
          VERSIONS => '1', COMPRESSION => 'LZO', TTL => '2147483647',
          BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}}
          10/12/05 09:33:36 DEBUG client.HTable$ClientScanner: Advancing internal scanner
          to startKey at 'DF76CF458433DB5D0CB2C50042452B296E3721A7'
          {noformat}
          Hide
          Jonathan Gray added a comment -

          I see multiple families. Anything weird like the first family not having any data in it and the data in the other families? Or some row that is really big?

          Show
          Jonathan Gray added a comment - I see multiple families. Anything weird like the first family not having any data in it and the data in the other families? Or some row that is really big?
          Hide
          stack added a comment -

          This is the formatter for the message you see:

                LOG.info("Split row is not inside region key range or is equal to " +
                    "startkey: " + Bytes.toString(this.splitrow));
          

          So, when it logs:

          Split row is not inside region key range or is equal to startkey: 
          DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF
          

          ... we are actually printing out the proposed 'splitrow' which indeed is same as the start row.

          Why do we keep picking the same splitrow over and over though region is large? Must be something dumb we're doing w/ mutliple families or so?

          Good on you Andrew.

          Show
          stack added a comment - This is the formatter for the message you see: LOG.info( "Split row is not inside region key range or is equal to " + "startkey: " + Bytes.toString( this .splitrow)); So, when it logs: Split row is not inside region key range or is equal to startkey: DE0CBA1D6CDFCDD6CBC1065D2C9C1CA17BDA0FAF ... we are actually printing out the proposed 'splitrow' which indeed is same as the start row. Why do we keep picking the same splitrow over and over though region is large? Must be something dumb we're doing w/ mutliple families or so? Good on you Andrew.
          Hide
          stack added a comment -

          Moving to 0.90.1. I'm building a new RC. We can add this fix into 0.90.1.

          Show
          stack added a comment - Moving to 0.90.1. I'm building a new RC. We can add this fix into 0.90.1.
          stack made changes -
          Fix Version/s 0.90.1 [ 12315548 ]
          Fix Version/s 0.90.0 [ 12313607 ]
          Hide
          Todd Lipcon added a comment -

          Andrew: is this one of your canned test scenarios? Is it something you could make available for other people to test with?

          Show
          Todd Lipcon added a comment - Andrew: is this one of your canned test scenarios? Is it something you could make available for other people to test with?
          Hide
          stack added a comment -

          Pulling into 0.90.0. If no fix before we cut next RC, will punt again.

          Show
          stack added a comment - Pulling into 0.90.0. If no fix before we cut next RC, will punt again.
          stack made changes -
          Fix Version/s 0.90.0 [ 12313607 ]
          Fix Version/s 0.90.1 [ 12315548 ]
          Hide
          Andrew Purtell added a comment -

          I sprinkled logging through Store#checkSplit and other places to make sure but indeed the midkey was equalling the startkey of the largest file.

          I reviewed the code of my test again and realized through a schema mistake (maxVersions INT_MAX instead of 1) the test was storing many versions of the same large object (~3 MB), creating apparently a degenerate case.

          Is it worth looking at another file besides the largest to find a valid midkey if the largest file has this kind of skew?

          Show
          Andrew Purtell added a comment - I sprinkled logging through Store#checkSplit and other places to make sure but indeed the midkey was equalling the startkey of the largest file. I reviewed the code of my test again and realized through a schema mistake (maxVersions INT_MAX instead of 1) the test was storing many versions of the same large object (~3 MB), creating apparently a degenerate case. Is it worth looking at another file besides the largest to find a valid midkey if the largest file has this kind of skew?
          Andrew Purtell made changes -
          Fix Version/s 0.90.0 [ 12313607 ]
          Priority Critical [ 2 ] Major [ 3 ]
          Hide
          Jonathan Gray added a comment -

          Andy, I'm going to change that algorithm when I finally get back to HBASE-2375 for 0.92. Agree we should handle this case as best as we can.

          Show
          Jonathan Gray added a comment - Andy, I'm going to change that algorithm when I finally get back to HBASE-2375 for 0.92. Agree we should handle this case as best as we can.
          Hide
          Andrew Purtell added a comment -

          Closing as dup of HBASE-2375.

          Show
          Andrew Purtell added a comment - Closing as dup of HBASE-2375 .
          Andrew Purtell made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]
          stack made changes -
          Link This issue is related to HBASE-2375 [ HBASE-2375 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Andrew Purtell
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development