Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
In our company 2.X cluster. I found some region compaction keeps failling because some cell can't construct succefully. In fact , we even can't read these cell.
From follow stack , we can found the bug cause KeyValue can't constructed.
Simple Log and Stack:
// code placeholder 2021-11-18 16:50:47,708 ERROR [regionserver/xxxx:60020-longCompactions-4] regionserver.CompactSplit: Compaction failed region=xx_table,3610ff49595a0fc4a824f2a575f37a31,1570874723992.dac703ceb35e8d8703233bebf34ae49f., storeName=c, priority=-319, startTime=1637225447127 java.lang.IllegalArgumentException: Invalid tag length at position=4659867, tagLength=0, at org.apache.hadoop.hbase.KeyValueUtil.checkKeyValueTagBytes(KeyValueUtil.java:685) at org.apache.hadoop.hbase.KeyValueUtil.checkKeyValueBytes(KeyValueUtil.java:643) at org.apache.hadoop.hbase.KeyValue.<init>(KeyValue.java:345) at org.apache.hadoop.hbase.SizeCachedKeyValue.<init>(SizeCachedKeyValue.java:43) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.getCell(HFileReaderImpl.java:981) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:233) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:418) at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:322) at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:288) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:487) at org.apache.hadoop.hbase.regionserver.compactions.Compactor$1.createScanner(Compactor.java:248) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:318) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1468) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2266) at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:624) at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:666) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
From further observation, I found the following characteristics:
- Cell size more than 2M
- We can reproduce the bug only after in memory compact
- Cell bytes end with \x00\x02\x00\x00
In fact, the root reason is method (MemStoreLABImpl.forceCopyOfBigCellInto) which only invoked when cell bigger than data chunk size construct cell with wrong length. So there are 4 bytes (chunk head size) append end of the cell bytes.
Attachments
Issue Links
- links to