HBase
  1. HBase
  2. HBASE-4890

fix possible NPE in HConnectionManager

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.92.0
    • Fix Version/s: 0.92.1, 0.94.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I was running YCSB against a 0.92 branch and encountered this error message:

      11/11/29 08:47:16 WARN client.HConnectionManager$HConnectionImplementation: Failed all from region=usertable,user3917479014967760871,1322555655231.f78d161e5724495a9723bcd972f97f41., hostname=c0316.hal.cloudera.com, port=57020
      java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException
              at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
              at java.util.concurrent.FutureTask.get(FutureTask.java:83)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1501)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1353)
              at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:898)
              at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:775)
              at org.apache.hadoop.hbase.client.HTable.put(HTable.java:750)
              at com.yahoo.ycsb.db.HBaseClient.update(Unknown Source)
              at com.yahoo.ycsb.DBWrapper.update(Unknown Source)
              at com.yahoo.ycsb.workloads.CoreWorkload.doTransactionUpdate(Unknown Source)
              at com.yahoo.ycsb.workloads.CoreWorkload.doTransaction(Unknown Source)
              at com.yahoo.ycsb.ClientThread.run(Unknown Source)
      Caused by: java.lang.RuntimeException: java.lang.NullPointerException
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1315)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1327)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1325)
              at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
              at java.util.concurrent.FutureTask.run(FutureTask.java:138)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:619)
      Caused by: java.lang.NullPointerException
              at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:158)
              at $Proxy4.multi(Unknown Source)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1330)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1328)
              at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1309)
              ... 7 more
      

      It looks like the NPE is caused by server being null in the MultiRespone call() method.

           public MultiResponse call() throws IOException {
               return getRegionServerWithoutRetries(
                   new ServerCallable<MultiResponse>(connection, tableName, null) {
                     public MultiResponse call() throws IOException {
                       return server.multi(multi);
                     }
                     @Override
                     public void connect(boolean reload) throws IOException {
                       server =
                         connection.getHRegionConnection(loc.getHostname(), loc.getPort());
                     }
                   }
               );
      
      1. 4890.txt
        1 kB
        stack
      2. 4890-v2.txt
        2 kB
        stack
      3. 4890-v3.txt
        2 kB
        stack
      4. 4890-v3.txt
        2 kB
        stack
      5. 4890-v3.txt
        2 kB
        stack
      6. splits.txt
        14 kB
        stack

        Issue Links

          Activity

          Hide
          stack added a comment -

          IOE is better than NPE. The IOE says what the error is. What we want to do about this scenario – a process running serverside for longer than we're prepared to wait – is something we need to work on. We either add the keep-alive or adjust the rpctimeout by the size of the request?

          Show
          stack added a comment - IOE is better than NPE. The IOE says what the error is. What we want to do about this scenario – a process running serverside for longer than we're prepared to wait – is something we need to work on. We either add the keep-alive or adjust the rpctimeout by the size of the request?
          Hide
          ramkrishna.s.vasudevan added a comment -

          And you used to get NPE?

          Yes.. got NPE.

          Show
          ramkrishna.s.vasudevan added a comment - And you used to get NPE? Yes.. got NPE.
          Hide
          Lars Hofhansl added a comment -

          Hmm... No I guess not.
          But then back to Ram's point. An IOE is not in principle better than an NPE.

          Show
          Lars Hofhansl added a comment - Hmm... No I guess not. But then back to Ram's point. An IOE is not in principle better than an NPE.
          Hide
          stack added a comment -

          Now we get IOE as the patch is applied.

          And you used to get NPE?

          Hmm... Good point, shouldn't the client retry unless it received a DoNotRetryException?

          Do we want it retry a timeout? For example, in the above pathological case, we timed out because we wanted server to open 3k regions and it was taking longer than rpctimeout. Do we want to retry that call?

          Show
          stack added a comment - Now we get IOE as the patch is applied. And you used to get NPE? Hmm... Good point, shouldn't the client retry unless it received a DoNotRetryException? Do we want it retry a timeout? For example, in the above pathological case, we timed out because we wanted server to open 3k regions and it was taking longer than rpctimeout. Do we want to retry that call?
          Hide
          Lars Hofhansl added a comment -

          Hmm... Good point, shouldn't the client retry unless it received a DoNotRetryException?

          Show
          Lars Hofhansl added a comment - Hmm... Good point, shouldn't the client retry unless it received a DoNotRetryException?
          Hide
          ramkrishna.s.vasudevan added a comment -

          This is pretty easy to come. And a client with more write threads is creating this. Now we get IOE as the patch is applied.

          Show
          ramkrishna.s.vasudevan added a comment - This is pretty easy to come. And a client with more write threads is creating this. Now we get IOE as the patch is applied.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2674 (See https://builds.apache.org/job/HBase-TRUNK/2674/)
          HBASE-4890 fix possible NPE in HConnectionManager (Revision 1298272)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2674 (See https://builds.apache.org/job/HBase-TRUNK/2674/ ) HBASE-4890 fix possible NPE in HConnectionManager (Revision 1298272) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92-security #97 (See https://builds.apache.org/job/HBase-0.92-security/97/)
          HBASE-4890 fix possible NPE in HConnectionManager (Revision 1298270)

          Result = FAILURE
          stack :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Show
          Hudson added a comment - Integrated in HBase-0.92-security #97 (See https://builds.apache.org/job/HBase-0.92-security/97/ ) HBASE-4890 fix possible NPE in HConnectionManager (Revision 1298270) Result = FAILURE stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #320 (See https://builds.apache.org/job/HBase-0.92/320/)
          HBASE-4890 fix possible NPE in HConnectionManager (Revision 1298270)

          Result = FAILURE
          stack :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #320 (See https://builds.apache.org/job/HBase-0.92/320/ ) HBASE-4890 fix possible NPE in HConnectionManager (Revision 1298270) Result = FAILURE stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #131 (See https://builds.apache.org/job/HBase-TRUNK-security/131/)
          HBASE-4890 fix possible NPE in HConnectionManager (Revision 1298272)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #131 (See https://builds.apache.org/job/HBase-TRUNK-security/131/ ) HBASE-4890 fix possible NPE in HConnectionManager (Revision 1298272) Result = SUCCESS stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #19 (See https://builds.apache.org/job/HBase-0.94/19/)
          HBASE-4890 fix possible NPE in HConnectionManager (Revision 1298271)

          Result = SUCCESS
          stack :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #19 (See https://builds.apache.org/job/HBase-0.94/19/ ) HBASE-4890 fix possible NPE in HConnectionManager (Revision 1298271) Result = SUCCESS stack : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          Hide
          stack added a comment -

          Committed trunk and 0.92, 0.94. Thanks for review and testing Cosmin.

          This situation is kinda ugly. The client is giving up because the rpc went on to long. Its still running over in the server. When the server finishes, its going to notice the client went away.

          Now at least there'll be coherent exception client-side to match server-side exception in place of an NPE (and the master won't be die'ing because of it)

          Show
          stack added a comment - Committed trunk and 0.92, 0.94. Thanks for review and testing Cosmin. This situation is kinda ugly. The client is giving up because the rpc went on to long. Its still running over in the server. When the server finishes, its going to notice the client went away. Now at least there'll be coherent exception client-side to match server-side exception in place of an NPE (and the master won't be die'ing because of it)
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517500/4890-v3.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 javadoc. The javadoc tool appears to have generated -129 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 154 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1131//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1131//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1131//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517500/4890-v3.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -129 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 154 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1131//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1131//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1131//console This message is automatically generated.
          Hide
          Cosmin Lehene added a comment -

          It seems fine. I get the IOE instead of NPE now

          java.util.concurrent.ExecutionException: java.io.IOException: Call to ld1/10.72.32.50:60020 failed on local exception: org.apache.hadoop.hbase.ipc.HBaseClient$CallTimeoutException: Call id=1321, waitTime=97566, rpcTimetout=60000
          	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
          	at java.util.concurrent.FutureTask.get(FutureTask.java:83)
          	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
          	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
          	at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
          	at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777)
          	at org.apache.hadoop.hbase.client.HTable.put(HTable.java:752)
          	at com.adobe.saasbase.scratch.Smith$PutThread.run(Smith.java:74)
          Caused by: java.io.IOException: Call to ld1/10.72.32.50:60020 failed on local exception: org.apache.hadoop.hbase.ipc.HBaseClient$CallTimeoutException: Call id=1321, waitTime=97566, rpcTimetout=60000
          	at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:953)
          	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:922)
          	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
          	at $Proxy5.multi(Unknown Source)
          	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
          	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
          	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
          	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
          	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
          	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          	at java.lang.Thread.run(Thread.java:662)
          Caused by: org.apache.hadoop.hbase.ipc.HBaseClient$CallTimeoutException: Call id=1321, waitTime=97566, rpcTimetout=60000
          	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.cleanupCalls(HBaseClient.java:684)
          	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:613)
          	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
          
          
          
          Show
          Cosmin Lehene added a comment - It seems fine. I get the IOE instead of NPE now java.util.concurrent.ExecutionException: java.io.IOException: Call to ld1/10.72.32.50:60020 failed on local exception: org.apache.hadoop.hbase.ipc.HBaseClient$CallTimeoutException: Call id=1321, waitTime=97566, rpcTimetout=60000 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:752) at com.adobe.saasbase.scratch.Smith$PutThread.run(Smith.java:74) Caused by: java.io.IOException: Call to ld1/10.72.32.50:60020 failed on local exception: org.apache.hadoop.hbase.ipc.HBaseClient$CallTimeoutException: Call id=1321, waitTime=97566, rpcTimetout=60000 at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:953) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:922) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) at $Proxy5.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang. Thread .run( Thread .java:662) Caused by: org.apache.hadoop.hbase.ipc.HBaseClient$CallTimeoutException: Call id=1321, waitTime=97566, rpcTimetout=60000 at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.cleanupCalls(HBaseClient.java:684) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:613) at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
          Hide
          stack added a comment -

          no-prefix

          Show
          stack added a comment - no-prefix
          Hide
          stack added a comment -

          More Cosmin feedback.

          Show
          stack added a comment - More Cosmin feedback.
          Hide
          stack added a comment -

          Accomodate Cosmin review feedback

          Show
          stack added a comment - Accomodate Cosmin review feedback
          Hide
          stack added a comment -

          The NPE is happening in j-d's artificial case because we're doing a bulk open of 3k regions and its taking a little while to complete; i.e. > than the rpc timeout. There is no error though becaues this is a client running in the master and its connecting to a single regionserver old doing meta scans in the meantime etc. updating last activity on the connection... so we're not running into a socket timeout which it looks like the expectation is here... that there MUST be an exception outstanding if Call has been running for > rpctimeout.

          Cosmin sees the exact stacktrace that Jon originally uploaded so we'll try this patch on his cluster (Cosmin also speculates this NPE happens only in the extreme, in ycsb or open 3k regions kinda extremes. He is seeing it only when he does extreme load test on his cluster)

          Show
          stack added a comment - The NPE is happening in j-d's artificial case because we're doing a bulk open of 3k regions and its taking a little while to complete; i.e. > than the rpc timeout. There is no error though becaues this is a client running in the master and its connecting to a single regionserver old doing meta scans in the meantime etc. updating last activity on the connection... so we're not running into a socket timeout which it looks like the expectation is here... that there MUST be an exception outstanding if Call has been running for > rpctimeout. Cosmin sees the exact stacktrace that Jon originally uploaded so we'll try this patch on his cluster (Cosmin also speculates this NPE happens only in the extreme, in ycsb or open 3k regions kinda extremes. He is seeing it only when he does extreme load test on his cluster)
          Hide
          stack added a comment -

          J-D can reproduce it using this attached file and this command in the shell:

          create 't1', 'f1', {SPLITS_FILE => 'splits.txt'}
          
          
          Show
          stack added a comment - J-D can reproduce it using this attached file and this command in the shell: create 't1', 'f1', {SPLITS_FILE => 'splits.txt'}
          Hide
          Lars Hofhansl added a comment -

          I agree, this is scary. I too will some debugging on Monday.

          Show
          Lars Hofhansl added a comment - I agree, this is scary. I too will some debugging on Monday.
          Hide
          stack added a comment -

          This NPE is a bit too easy to manufacture. Should we hold up 0.92.1 till fixed? Can work on it monday?

          Show
          stack added a comment - This NPE is a bit too easy to manufacture. Should we hold up 0.92.1 till fixed? Can work on it monday?
          Hide
          Davey Yan added a comment -

          I got the same error when I committed data with a multi-threads client.

          12/03/02 21:52:27 WARN client.HConnectionManager$HConnectionImplementation: Failed all from region=File,76cd9bbd-8431-4639-8440-b2bac89488f7\x00/\x005f87b6e0-3ce6-446b-a870-7c99b7e0f818,1330625239513.f2dfbf8673f58f5e8c620a94638bf736., hostname=ubuntu6403, port=60020
          java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException
          	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
          	at java.util.concurrent.FutureTask.get(FutureTask.java:83)
          	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
          	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
          	at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
          	at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777)
          	at org.apache.hadoop.hbase.client.HTable.put(HTable.java:752)
          	at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397)
          	at com.xxx.file.service.HBaseFileManager.save(Unknown Source)
          	at com.xxx.file.service.HBaseFileManager.createFileMessage(Unknown Source)
          	at com.xxx.perf.PerformanceEvaluation$1.run(Unknown Source)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          	at java.lang.Thread.run(Thread.java:662)
          Caused by: java.lang.RuntimeException: java.lang.NullPointerException
          	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1371)
          	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
          	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
          	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          	... 3 more
          Caused by: java.lang.NullPointerException
          	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:158)
          	at $Proxy19.multi(Unknown Source)
          	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
          	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
          	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
          	... 7 more
          

          This multi-threads client puts data into table 'File'.
          And there are three secondary-index tables (FileAIndex, FileBIndex, FileCIndex) being filled by a coprocessor of table 'File'.

          The coprocessors outlined HERE:

          FileCoprocessor.java
          public class FileCoprocessor extends BaseRegionObserver {
          	HTablePool hTablePool;
          	
          	@Override
          	public void start(CoprocessorEnvironment e) throws IOException {
          		super.start(e);
          		hTablePool = new HTablePool(e.getConfiguration(), 100);
          	}
          	
          	@Override
          	public void stop(CoprocessorEnvironment e) throws IOException {
          		hTablePool.close();
          		super.stop(e);
          	}
          	
          	@Override
          	public void prePut(...) {
          		hTablePool.getTable("FileAIndex").put(...);
          		hTablePool.getTable("FileBIndex").put(...);
          		hTablePool.getTable("FileCIndex").put(...);
          	}
          }
          

          When I disable the coprocessor, I cannot reproduced the error until now, but I am not sure.

          Environment & Version:
          HBase 0.92.0
          Hadoop 1.0.0
          java version "1.6.0_30" Java(TM) SE Runtime Environment (build 1.6.0_30-b12) Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode)
          Ubuntu Server 10.04 LTS 64-bit
          1 master + 4 regionserver

          Show
          Davey Yan added a comment - I got the same error when I committed data with a multi-threads client. 12/03/02 21:52:27 WARN client.HConnectionManager$HConnectionImplementation: Failed all from region=File,76cd9bbd-8431-4639-8440-b2bac89488f7\x00/\x005f87b6e0-3ce6-446b-a870-7c99b7e0f818,1330625239513.f2dfbf8673f58f5e8c620a94638bf736., hostname=ubuntu6403, port=60020 java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:777) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:752) at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.put(HTablePool.java:397) at com.xxx.file.service.HBaseFileManager.save(Unknown Source) at com.xxx.file.service.HBaseFileManager.createFileMessage(Unknown Source) at com.xxx.perf.PerformanceEvaluation$1.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1371) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) ... 3 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:158) at $Proxy19.multi(Unknown Source) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365) ... 7 more This multi-threads client puts data into table 'File'. And there are three secondary-index tables (FileAIndex, FileBIndex, FileCIndex) being filled by a coprocessor of table 'File'. The coprocessors outlined HERE: FileCoprocessor.java public class FileCoprocessor extends BaseRegionObserver { HTablePool hTablePool; @Override public void start(CoprocessorEnvironment e) throws IOException { super .start(e); hTablePool = new HTablePool(e.getConfiguration(), 100); } @Override public void stop(CoprocessorEnvironment e) throws IOException { hTablePool.close(); super .stop(e); } @Override public void prePut(...) { hTablePool.getTable( "FileAIndex" ).put(...); hTablePool.getTable( "FileBIndex" ).put(...); hTablePool.getTable( "FileCIndex" ).put(...); } } When I disable the coprocessor, I cannot reproduced the error until now, but I am not sure. Environment & Version: HBase 0.92.0 Hadoop 1.0.0 java version "1.6.0_30" Java(TM) SE Runtime Environment (build 1.6.0_30-b12) Java HotSpot(TM) 64-Bit Server VM (build 20.5-b03, mixed mode) Ubuntu Server 10.04 LTS 64-bit 1 master + 4 regionserver
          Hide
          Jean-Daniel Cryans added a comment -

          Sorry, I got distracted (I even forgot about this issue) so nothing new.

          Show
          Jean-Daniel Cryans added a comment - Sorry, I got distracted (I even forgot about this issue) so nothing new.
          Hide
          stack added a comment -

          Any more luck w/ this one J-D (or you got distracted?)

          Show
          stack added a comment - Any more luck w/ this one J-D (or you got distracted?)
          Hide
          Jean-Daniel Cryans added a comment -

          Very unlikely, 5336's NPE is coming from Hadoop land instead of our IPC and to me it looks like that file was already closed.

          Show
          Jean-Daniel Cryans added a comment - Very unlikely, 5336's NPE is coming from Hadoop land instead of our IPC and to me it looks like that file was already closed.
          Hide
          Lars Hofhansl added a comment -

          HBASE-5336 might be related.

          Show
          Lars Hofhansl added a comment - HBASE-5336 might be related.
          Hide
          Jean-Daniel Cryans added a comment -

          More progress, we're setting a null exception on the call in this code:

              protected void cleanupCalls(long rpcTimeout) {
                Iterator<Entry<Integer, Call>> itor = calls.entrySet().iterator();
                while (itor.hasNext()) {
                  Call c = itor.next().getValue();
                  long waitTime = System.currentTimeMillis() - c.getStartTime();
                  if (waitTime >= rpcTimeout) {
                    c.setException(closeException); // local exception
                    synchronized (c) {
                      c.notifyAll() ;
                    }
          

          Now adding some debugging in there (printing a WARN and doing a continue instead of setting the exception), the call never gets a SocketTimeoutException set like it's supposed to be. It's just hanging around...

          Show
          Jean-Daniel Cryans added a comment - More progress, we're setting a null exception on the call in this code: protected void cleanupCalls( long rpcTimeout) { Iterator<Entry< Integer , Call>> itor = calls.entrySet().iterator(); while (itor.hasNext()) { Call c = itor.next().getValue(); long waitTime = System .currentTimeMillis() - c.getStartTime(); if (waitTime >= rpcTimeout) { c.setException(closeException); // local exception synchronized (c) { c.notifyAll() ; } Now adding some debugging in there (printing a WARN and doing a continue instead of setting the exception), the call never gets a SocketTimeoutException set like it's supposed to be. It's just hanging around...
          Hide
          Jean-Daniel Cryans added a comment -

          Upgrading to blocker, I'm pretty sure this is not special to HCM as I just got this trying to create 3k regions in one shot:

          2012-02-24 15:21:17,962 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
          2012-02-24 15:21:17,962 FATAL org.apache.hadoop.hbase.master.HMaster: Uncaught exception in h-25-183.sfo.stumble.net,64066,1330125075976-StartupBulkAssigner-0
          java.lang.NullPointerException
          at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:158)
          at $Proxy11.openRegions(Unknown Source)
          at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:455)
          at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1373)
          at org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.run(AssignmentManager.java:2224)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          at java.lang.Thread.run(Thread.java:680)
          2012-02-24 15:21:17,962 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

          Show
          Jean-Daniel Cryans added a comment - Upgrading to blocker, I'm pretty sure this is not special to HCM as I just got this trying to create 3k regions in one shot: 2012-02-24 15:21:17,962 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2012-02-24 15:21:17,962 FATAL org.apache.hadoop.hbase.master.HMaster: Uncaught exception in h-25-183.sfo.stumble.net,64066,1330125075976-StartupBulkAssigner-0 java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:158) at $Proxy11.openRegions(Unknown Source) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:455) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1373) at org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.run(AssignmentManager.java:2224) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) 2012-02-24 15:21:17,962 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
          Hide
          stack added a comment -

          Marking fix for 0.92.1

          Show
          stack added a comment - Marking fix for 0.92.1
          Hide
          Simon Dircks added a comment -

          I was also able to reproduce this:

          hadoop-1.0 and hbase-0.92 with YCSB.

          2012/01/25 15:19:24 WARN client.HConnectionManager$HConnectionImplementation: Failed all from region=usertable,user3076346045817661344,1327530607222.bab55fba6adb17bc8757eb6cdee99a91., hostname=datatask6.hadoop.telescope.tv, port=60020
          java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException

          Got this error on the LOAD part of YCSB

          /usr/local/bin/java -cp "build/ycsb.jar:db/hbase/lib/*:db/hbase/conf/" com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family1 -p recordcount=5000000 -s > load.dat

          Show
          Simon Dircks added a comment - I was also able to reproduce this: hadoop-1.0 and hbase-0.92 with YCSB. 2012/01/25 15:19:24 WARN client.HConnectionManager$HConnectionImplementation: Failed all from region=usertable,user3076346045817661344,1327530607222.bab55fba6adb17bc8757eb6cdee99a91., hostname=datatask6.hadoop.telescope.tv, port=60020 java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.NullPointerException Got this error on the LOAD part of YCSB /usr/local/bin/java -cp "build/ycsb.jar:db/hbase/lib/*:db/hbase/conf/" com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family1 -p recordcount=5000000 -s > load.dat

            People

            • Assignee:
              stack
              Reporter:
              Jonathan Hsieh
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development