Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-16201

NPE in RpcServer causing intermittent UT failure of TestMasterReplication#testHFileCyclicReplication

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0, 1.1.6, 0.98.21, 1.2.3, 2.0.0
    • None
    • None
    • Reviewed

    Description

      Every several rounds of TestMasterReplication#testHFileCyclicReplication, we could observe below NPE in UT log:

      java.lang.NullPointerException
          at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2257)
          at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118)
          at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189)
          at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169)
      

      And related codes at RpcServer line 2257 are:

            if (e instanceof ServiceException) {
              e = e.getCause();
            }
      
            // increment the number of requests that were exceptions.
            metrics.exception(e);
      
            if (e instanceof LinkageError) throw new DoNotRetryIOException(e);
            if (e instanceof IOException) throw (IOException)e;
      

      And after some debugging, we could find several places that constructing ServiceException with no cause, such as in RsRpcServices#replicateWALEntry:

            if (regionServer.replicationSinkHandler != null) {
              ...
            } else {
              throw new ServiceException("Replication services are not initialized yet");
            }
      

      So we should firstly check and only reset e=e.getCause() when the cause is not null

      Attachments

        1. HBASE-16201.patch
          1 kB
          Yu Li

        Activity

          liyu Yu Li added a comment -

          A straight forward patch, with it we could see below debug message in UT log and TestMasterReplication#testHFileCyclicReplication won't fail.

          2016-07-08 15:12:17,818 DEBUG [RpcServer.FifoWFPBQ.replication.handler=2,queue=0,port=57350] ipc.RpcServer(2251): Caught a ServiceException with null cause
          com.google.protobuf.ServiceException: Replication services are not initialized yet
          	at org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:1929)
          	at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22751)
          	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2212)
          	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118)
          	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189)
          	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169)
          
          liyu Yu Li added a comment - A straight forward patch, with it we could see below debug message in UT log and TestMasterReplication#testHFileCyclicReplication won't fail. 2016-07-08 15:12:17,818 DEBUG [RpcServer.FifoWFPBQ.replication.handler=2,queue=0,port=57350] ipc.RpcServer(2251): Caught a ServiceException with null cause com.google.protobuf.ServiceException: Replication services are not initialized yet at org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:1929) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22751) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2212) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169)
          chenheng Heng Chen added a comment -

          +1

          chenheng Heng Chen added a comment - +1
          ashish singhi Ashish Singhi added a comment -

          +1

          ashish singhi Ashish Singhi added a comment - +1
          ashish singhi Ashish Singhi added a comment -

          Can you submit the patch for a QA run.

          ashish singhi Ashish Singhi added a comment - Can you submit the patch for a QA run.
          liyu Yu Li added a comment -

          Thanks ashish singhi and chenheng for review. Submit patch for HadoopQA

          liyu Yu Li added a comment - Thanks ashish singhi and chenheng for review. Submit patch for HadoopQA
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          +1 hbaseanti 0m 0s Patch does not have any anti-patterns.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 3m 3s master passed
          +1 compile 0m 37s master passed with JDK v1.8.0
          +1 compile 0m 33s master passed with JDK v1.7.0_80
          +1 checkstyle 0m 53s master passed
          +1 mvneclipse 0m 16s master passed
          +1 findbugs 1m 57s master passed
          +1 javadoc 0m 27s master passed with JDK v1.8.0
          +1 javadoc 0m 33s master passed with JDK v1.7.0_80
          +1 mvninstall 0m 44s the patch passed
          +1 compile 0m 41s the patch passed with JDK v1.8.0
          +1 javac 0m 41s the patch passed
          +1 compile 0m 33s the patch passed with JDK v1.7.0_80
          +1 javac 0m 33s the patch passed
          +1 checkstyle 0m 52s the patch passed
          +1 mvneclipse 0m 16s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 hadoopcheck 26m 6s Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1.
          +1 findbugs 2m 10s the patch passed
          +1 javadoc 0m 26s the patch passed with JDK v1.8.0
          +1 javadoc 0m 32s the patch passed with JDK v1.7.0_80
          -1 unit 87m 40s hbase-server in the patch failed.
          +1 asflicense 0m 18s Patch does not generate ASF License warnings.
          129m 0s



          Reason Tests
          Failed junit tests hadoop.hbase.snapshot.TestMobFlushSnapshotFromClient
          Timed out junit tests org.apache.hadoop.hbase.replication.multiwal.TestReplicationKillMasterRSCompressedWithMultipleWAL
            org.apache.hadoop.hbase.replication.regionserver.TestReplicationWALReaderManager



          Subsystem Report/Notes
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12816782/HBASE-16201.patch
          JIRA Issue HBASE-16201
          Optional Tests asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile
          uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
          git revision master / 17edca6
          Default Java 1.7.0_80
          Multi-JDK versions /home/jenkins/tools/java/jdk1.8.0:1.8.0 /home/jenkins/jenkins-slave/tools/hudson.model.JDK/JDK_1.7_latest_:1.7.0_80
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HBASE-Build/2568/artifact/patchprocess/patch-unit-hbase-server.txt
          unit test logs https://builds.apache.org/job/PreCommit-HBASE-Build/2568/artifact/patchprocess/patch-unit-hbase-server.txt
          Test Results https://builds.apache.org/job/PreCommit-HBASE-Build/2568/testReport/
          modules C: hbase-server U: hbase-server
          Console output https://builds.apache.org/job/PreCommit-HBASE-Build/2568/console
          Powered by Apache Yetus 0.2.1 http://yetus.apache.org

          This message was automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment +1 hbaseanti 0m 0s Patch does not have any anti-patterns. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 3m 3s master passed +1 compile 0m 37s master passed with JDK v1.8.0 +1 compile 0m 33s master passed with JDK v1.7.0_80 +1 checkstyle 0m 53s master passed +1 mvneclipse 0m 16s master passed +1 findbugs 1m 57s master passed +1 javadoc 0m 27s master passed with JDK v1.8.0 +1 javadoc 0m 33s master passed with JDK v1.7.0_80 +1 mvninstall 0m 44s the patch passed +1 compile 0m 41s the patch passed with JDK v1.8.0 +1 javac 0m 41s the patch passed +1 compile 0m 33s the patch passed with JDK v1.7.0_80 +1 javac 0m 33s the patch passed +1 checkstyle 0m 52s the patch passed +1 mvneclipse 0m 16s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 hadoopcheck 26m 6s Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. +1 findbugs 2m 10s the patch passed +1 javadoc 0m 26s the patch passed with JDK v1.8.0 +1 javadoc 0m 32s the patch passed with JDK v1.7.0_80 -1 unit 87m 40s hbase-server in the patch failed. +1 asflicense 0m 18s Patch does not generate ASF License warnings. 129m 0s Reason Tests Failed junit tests hadoop.hbase.snapshot.TestMobFlushSnapshotFromClient Timed out junit tests org.apache.hadoop.hbase.replication.multiwal.TestReplicationKillMasterRSCompressedWithMultipleWAL   org.apache.hadoop.hbase.replication.regionserver.TestReplicationWALReaderManager Subsystem Report/Notes JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12816782/HBASE-16201.patch JIRA Issue HBASE-16201 Optional Tests asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile uname Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh git revision master / 17edca6 Default Java 1.7.0_80 Multi-JDK versions /home/jenkins/tools/java/jdk1.8.0:1.8.0 /home/jenkins/jenkins-slave/tools/hudson.model.JDK/JDK_1.7_latest_:1.7.0_80 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HBASE-Build/2568/artifact/patchprocess/patch-unit-hbase-server.txt unit test logs https://builds.apache.org/job/PreCommit-HBASE-Build/2568/artifact/patchprocess/patch-unit-hbase-server.txt Test Results https://builds.apache.org/job/PreCommit-HBASE-Build/2568/testReport/ modules C: hbase-server U: hbase-server Console output https://builds.apache.org/job/PreCommit-HBASE-Build/2568/console Powered by Apache Yetus 0.2.1 http://yetus.apache.org This message was automatically generated.
          liyu Yu Li added a comment -

          Checked all failed and timeout UT cases including TestMobFlushSnapshotFromClient, TestReplicationKillMasterRSCompressedWithMultipleWAL and TestReplicationWALReaderManager, confirmed they are all irrelative to patch here and could all pass in local run.

          Since already got two +1s and no issue in UT, will commit the patch soon.

          liyu Yu Li added a comment - Checked all failed and timeout UT cases including TestMobFlushSnapshotFromClient, TestReplicationKillMasterRSCompressedWithMultipleWAL and TestReplicationWALReaderManager, confirmed they are all irrelative to patch here and could all pass in local run. Since already got two +1s and no issue in UT, will commit the patch soon.
          liyu Yu Li added a comment -

          Pushed into master, branch-1, branch-1.2 and branch-1.3, and just noticed that I missed adding JIRA number in the commit log... Do I need to revert the commit and update the commit log? Sorry for the trouble... stack

          liyu Yu Li added a comment - Pushed into master, branch-1, branch-1.2 and branch-1.3, and just noticed that I missed adding JIRA number in the commit log... Do I need to revert the commit and update the commit log? Sorry for the trouble... stack
          stack Michael Stack added a comment -

          No problem. Yes, revert and reapply. It is a pain but the only way to fix a message that has been pushed to the public repo. carp84

          stack Michael Stack added a comment - No problem. Yes, revert and reapply. It is a pain but the only way to fix a message that has been pushed to the public repo. carp84
          stack Michael Stack added a comment -

          +1 on patch.

          stack Michael Stack added a comment - +1 on patch.
          liyu Yu Li added a comment -

          I see, thanks for the quick response sir stack, will follow the process and make the change

          liyu Yu Li added a comment - I see, thanks for the quick response sir stack , will follow the process and make the change
          liyu Yu Li added a comment -

          Done, revert and reapply to master, branch-1, branch-1.2 and branch-1.3, pushed into branch-1.1 as well. Thanks all for review.

          liyu Yu Li added a comment - Done, revert and reapply to master, branch-1, branch-1.2 and branch-1.3, pushed into branch-1.1 as well. Thanks all for review.
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-1.3-IT #746 (See https://builds.apache.org/job/HBase-1.3-IT/746/)
          HBASE-16201 NPE in RpcServer causing intermittent UT failure of (liyu: rev 43626fc06e895433cce304b5cee97999a106e0ac)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-1.3-IT #746 (See https://builds.apache.org/job/HBase-1.3-IT/746/ ) HBASE-16201 NPE in RpcServer causing intermittent UT failure of (liyu: rev 43626fc06e895433cce304b5cee97999a106e0ac) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-1.2-IT #548 (See https://builds.apache.org/job/HBase-1.2-IT/548/)
          HBASE-16201 fix a NPE issue in RpcServer (liyu: rev b99efe65e900299271b1e2a0c5feabd23930eb70)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-1.2-IT #548 (See https://builds.apache.org/job/HBase-1.2-IT/548/ ) HBASE-16201 fix a NPE issue in RpcServer (liyu: rev b99efe65e900299271b1e2a0c5feabd23930eb70) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-1.2 #667 (See https://builds.apache.org/job/HBase-1.2/667/)
          HBASE-16201 fix a NPE issue in RpcServer (liyu: rev b99efe65e900299271b1e2a0c5feabd23930eb70)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-1.2 #667 (See https://builds.apache.org/job/HBase-1.2/667/ ) HBASE-16201 fix a NPE issue in RpcServer (liyu: rev b99efe65e900299271b1e2a0c5feabd23930eb70) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-1.4 #276 (See https://builds.apache.org/job/HBase-1.4/276/)
          HBASE-16201 NPE in RpcServer causing intermittent UT failure of (liyu: rev 09d9dc4b594674d77fb344466daaabc3eab21da0)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment - FAILURE: Integrated in HBase-1.4 #276 (See https://builds.apache.org/job/HBase-1.4/276/ ) HBASE-16201 NPE in RpcServer causing intermittent UT failure of (liyu: rev 09d9dc4b594674d77fb344466daaabc3eab21da0) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-1.3 #774 (See https://builds.apache.org/job/HBase-1.3/774/)
          HBASE-16201 NPE in RpcServer causing intermittent UT failure of (liyu: rev 43626fc06e895433cce304b5cee97999a106e0ac)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-1.3 #774 (See https://builds.apache.org/job/HBase-1.3/774/ ) HBASE-16201 NPE in RpcServer causing intermittent UT failure of (liyu: rev 43626fc06e895433cce304b5cee97999a106e0ac) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-Trunk_matrix #1193 (See https://builds.apache.org/job/HBase-Trunk_matrix/1193/)
          HBASE-16201 NPE in RpcServer causing intermittent UT failure of (liyu: rev 3c39cbd92c3f309c98ca01bfb70ca89bc046a228)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment - FAILURE: Integrated in HBase-Trunk_matrix #1193 (See https://builds.apache.org/job/HBase-Trunk_matrix/1193/ ) HBASE-16201 NPE in RpcServer causing intermittent UT failure of (liyu: rev 3c39cbd92c3f309c98ca01bfb70ca89bc046a228) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-1.1-JDK8 #1830 (See https://builds.apache.org/job/HBase-1.1-JDK8/1830/)
          HBASE-16201 fix a NPE issue in RpcServer (liyu: rev 73189eb801f1c49e738e8a79838b1cd17b1fcff5)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-1.1-JDK8 #1830 (See https://builds.apache.org/job/HBase-1.1-JDK8/1830/ ) HBASE-16201 fix a NPE issue in RpcServer (liyu: rev 73189eb801f1c49e738e8a79838b1cd17b1fcff5) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment -

          SUCCESS: Integrated in HBase-1.1-JDK7 #1743 (See https://builds.apache.org/job/HBase-1.1-JDK7/1743/)
          HBASE-16201 fix a NPE issue in RpcServer (liyu: rev 73189eb801f1c49e738e8a79838b1cd17b1fcff5)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment - SUCCESS: Integrated in HBase-1.1-JDK7 #1743 (See https://builds.apache.org/job/HBase-1.1-JDK7/1743/ ) HBASE-16201 fix a NPE issue in RpcServer (liyu: rev 73189eb801f1c49e738e8a79838b1cd17b1fcff5) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment -

          FAILURE: Integrated in HBase-0.98-matrix #375 (See https://builds.apache.org/job/HBase-0.98-matrix/375/)
          HBASE-16201 NPE in RpcServer causing intermittent UT failure of (apurtell: rev 33d64e021a211a47a63327d435b2e8cb58f0a223)

          • hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
          hudson Hudson added a comment - FAILURE: Integrated in HBase-0.98-matrix #375 (See https://builds.apache.org/job/HBase-0.98-matrix/375/ ) HBASE-16201 NPE in RpcServer causing intermittent UT failure of (apurtell: rev 33d64e021a211a47a63327d435b2e8cb58f0a223) hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java

          People

            liyu Yu Li
            liyu Yu Li
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: