Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26467

Wrong Cell Generated by MemStoreLABImpl.forceCopyOfBigCellInto when Cell size bigger than data chunk size

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      In our company 2.X cluster. I found some region compaction keeps failling because some cell can't construct succefully. In fact , we even can't read these cell.

      From follow stack , we can found the bug cause KeyValue can't constructed.

      Simple Log and Stack: 

      // code placeholder
      2021-11-18 16:50:47,708 ERROR [regionserver/xxxx:60020-longCompactions-4] regionserver.CompactSplit: Compaction failed region=xx_table,3610ff49595a0fc4a824f2a575f37a31,1570874723992.dac703ceb35e8d8703233bebf34ae49f., storeName=c, priority=-319, startTime=1637225447127 
      java.lang.IllegalArgumentException: Invalid tag length at position=4659867, tagLength=0,         
      at org.apache.hadoop.hbase.KeyValueUtil.checkKeyValueTagBytes(KeyValueUtil.java:685)
              at org.apache.hadoop.hbase.KeyValueUtil.checkKeyValueBytes(KeyValueUtil.java:643)
              at org.apache.hadoop.hbase.KeyValue.<init>(KeyValue.java:345)
              at org.apache.hadoop.hbase.SizeCachedKeyValue.<init>(SizeCachedKeyValue.java:43)
              at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.getCell(HFileReaderImpl.java:981)
              at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:233)
              at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:418)
              at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:322)
              at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:288)
              at org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:487)
              at org.apache.hadoop.hbase.regionserver.compactions.Compactor$1.createScanner(Compactor.java:248)
              at org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:318)
              at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65)
              at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126)
              at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1468)
              at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2266)
              at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:624)
              at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:666)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:748) 

      From further observation, I found the following characteristics:

      1. Cell size more than 2M
      2. We can reproduce the bug only after in memory compact
      3. Cell bytes end with \x00\x02\x00\x00

       

      In fact, the root reason is method (MemStoreLABImpl.forceCopyOfBigCellInto) which only invoked when cell bigger than data chunk size construct cell with wrong length.  So there are 4 bytes (chunk head size) append end of the cell bytes.

      Attachments

        Issue Links

          Activity

            People

              zhengzhuobinzzb zhuobin zheng
              zhengzhuobinzzb zhuobin zheng
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: