Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.6.0, 2.4.17, 2.5.5
Description
Flush a columnfamily that doesn't exist in the table will cause NPE ERROR in both shell and the HMaster logs.
Reproduce
Start up HBase 2.5.9 cluster, executing the following commands with hbase shell in HMaster node will lead to NPE. (Can be reproduced determinstically)
create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 'NONE', BLOOMFILTER => 'ROWCOL'} incr 'table', 'row1', 'cf1:cell', 2 flush 'table', 'cf3'
The shell outputs
hbase:006:0> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 'NONE', BLOOMFILTER => 'ROWCOL'} Created table table Took 2.1238 seconds => Hbase::Table - table hbase:007:0> hbase:008:0> incr 'table', 'row1', 'cf1:cell', 2 COUNTER VALUE = 2 Took 0.0131 seconds hbase:009:0> hbase:010:0> flush 'table', 'cf3' ERROR: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) Caused by: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: java.lang.NullPointerException at org.apache.hadoop.hbase.procedure.flush.RegionServerFlushTableProcedureManager$FlushTableSubprocedurePool.waitForOutstandingTasks(RegionServerFlushTableProcedureManager.java:274) at org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.flushRegions(FlushTableSubprocedure.java:115) at org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.acquireBarrier(FlushTableSubprocedure.java:126) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:160) at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:46) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) For usage try 'help "flush"' Took 12.1713 seconds
According to the flush (flush.rb) command specification, user can flush a specific column family.
Flush all regions in passed table or pass a region row to flush an individual region or a region server name whose format is 'host,port,startcode', to flush all its regions. You can also flush a single column family for all regions within a table, or for an specific region only. For example: hbase> flush 'TABLENAME' hbase> flush 'TABLENAME','FAMILYNAME'
In the above case, cf3 an incorrect input (non-existing column family). If user tries to flush it, the expected output is:
- HBase rejects this operation
- returns a prompt saying the column family doesn't exist "ERROR: Unknown CF...".
In 2.6.0, the flush command would stuck and run into NPE
java.lang.NullPointerException: null
at org.apache.hadoop.hbase.regionserver.HRegion.logFatLineOnFlush(HRegion.java:2724) ~[hbase-server-2.6.0.jar:2.6.0]
at org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2640) ~[hbase-server-2.6.0.jar:2.6.0]
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2587) ~[hbase-server-2.6.0.jar:2.6.0]
Root Cause
There's a missing check for the whether the target flushing columnfamily exists.