HBase
  1. HBase
  2. HBASE-7530

[replication] Work around HDFS-4380 else we get NPEs

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.94.3
    • Fix Version/s: 0.94.5, 0.95.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I've been spending a lot of time trying to figure the recent test failures related to replication. One I seem to be constantly getting is this NPE:

      2013-01-09 10:08:56,912 ERROR [RegionServer:1;172.23.7.205,61604,1357754664830-EventThread.replicationSource,2] regionserver.ReplicationSource$1(727): Unexpected exception in ReplicationSource, currentPath=hdfs://localhost:61589/user/jdcryans/hbase/.logs/172.23.7.205,61604,1357754664830/172.23.7.205%2C61604%2C1357754664830.1357754936216
      java.lang.NullPointerException
              at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885)
              at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858)
              at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1834)
              at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
              at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
              at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108)
              at org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1495)
              at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.openFile(SequenceFileLogReader.java:62)
              at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1482)
              at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
              at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
              at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
              at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:308)
              at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:69)
              at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:500)
              at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:312)
      
      

      Talking to Todd Lipcon, he said it was likely fixed in Hadoop 2.0 via HDFS-3222 but for Hadoop 1.0 he created HDFS-4380. This seems to happen while crossing block boundaries and TestReplication uses a 20KB block size for the HLog. The intent was just to get HLogs to roll more often, and this can also be achieved with hbase.regionserver.logroll.multiplier with a value of 0.0003f.

      1. HBASE-7530.patch
        1.0 kB
        Jean-Daniel Cryans

        Issue Links

          Activity

          Hide
          Lars Hofhansl added a comment -

          Interesting. Did this only start recently (which would be strange)?
          This happens with larger blocksizes too, right? If so this should be critical.

          Show
          Lars Hofhansl added a comment - Interesting. Did this only start recently (which would be strange)? This happens with larger blocksizes too, right? If so this should be critical.
          Hide
          Jean-Daniel Cryans added a comment -

          Lars Hofhansl Not sure when it started happening, the code has changed on the HBase side but not on the Hadoop side so we should have seen this before. It should happen with larger block sizes too, just a few orders of magnitude less probable to happen than in does in TestReplication

          Show
          Jean-Daniel Cryans added a comment - Lars Hofhansl Not sure when it started happening, the code has changed on the HBase side but not on the Hadoop side so we should have seen this before. It should happen with larger block sizes too, just a few orders of magnitude less probable to happen than in does in TestReplication
          Hide
          Jean-Daniel Cryans added a comment -

          The fix I proposed, I'm currently testing it in a loop.

          Show
          Jean-Daniel Cryans added a comment - The fix I proposed, I'm currently testing it in a loop.
          Hide
          stack added a comment -

          I don't get what this change does. Previous we had an explicit sizing. This does explicit sizing too, right, by rolling at some multiple of current size?

          Show
          stack added a comment - I don't get what this change does. Previous we had an explicit sizing. This does explicit sizing too, right, by rolling at some multiple of current size?
          Hide
          stack added a comment -

          Off-line, J-D explained how previous we set block size at 20k. This patch just has us roll at 20k. Makes sense now. +1

          Show
          stack added a comment - Off-line, J-D explained how previous we set block size at 20k. This patch just has us roll at 20k. Makes sense now. +1
          Hide
          Jean-Daniel Cryans added a comment -

          Committed to trunk and 0.94

          Show
          Jean-Daniel Cryans added a comment - Committed to trunk and 0.94
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #3726 (See https://builds.apache.org/job/HBase-TRUNK/3726/)
          HBASE-7530 [replication] Work around HDFS-4380 else we get NPEs
          HBASE-7531 [replication] NPE in SequenceFileLogReader because
          ReplicationSource doesn't nullify the reader
          HBASE-7534 [replication] TestReplication.queueFailover can fail
          because HBaseTestingUtility.createMultiRegions is dangerous (Revision 1431768)

          Result = FAILURE
          jdcryans :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #3726 (See https://builds.apache.org/job/HBase-TRUNK/3726/ ) HBASE-7530 [replication] Work around HDFS-4380 else we get NPEs HBASE-7531 [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader HBASE-7534 [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous (Revision 1431768) Result = FAILURE jdcryans : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #342 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/342/)
          HBASE-7530 [replication] Work around HDFS-4380 else we get NPEs
          HBASE-7531 [replication] NPE in SequenceFileLogReader because
          ReplicationSource doesn't nullify the reader
          HBASE-7534 [replication] TestReplication.queueFailover can fail
          because HBaseTestingUtility.createMultiRegions is dangerous (Revision 1431768)

          Result = FAILURE
          jdcryans :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #342 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/342/ ) HBASE-7530 [replication] Work around HDFS-4380 else we get NPEs HBASE-7531 [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader HBASE-7534 [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous (Revision 1431768) Result = FAILURE jdcryans : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #722 (See https://builds.apache.org/job/HBase-0.94/722/)
          HBASE-7530 [replication] Work around HDFS-4380 else we get NPEs
          HBASE-7531 [replication] NPE in SequenceFileLogReader because
          ReplicationSource doesn't nullify the reader
          HBASE-7534 [replication] TestReplication.queueFailover can fail
          because HBaseTestingUtility.createMultiRegions is dangerous (Revision 1431769)

          Result = SUCCESS
          jdcryans :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #722 (See https://builds.apache.org/job/HBase-0.94/722/ ) HBASE-7530 [replication] Work around HDFS-4380 else we get NPEs HBASE-7531 [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader HBASE-7534 [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous (Revision 1431769) Result = SUCCESS jdcryans : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #95 (See https://builds.apache.org/job/HBase-0.94-security/95/)
          HBASE-7530 [replication] Work around HDFS-4380 else we get NPEs
          HBASE-7531 [replication] NPE in SequenceFileLogReader because
          ReplicationSource doesn't nullify the reader
          HBASE-7534 [replication] TestReplication.queueFailover can fail
          because HBaseTestingUtility.createMultiRegions is dangerous (Revision 1431769)

          Result = SUCCESS
          jdcryans :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #95 (See https://builds.apache.org/job/HBase-0.94-security/95/ ) HBASE-7530 [replication] Work around HDFS-4380 else we get NPEs HBASE-7531 [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader HBASE-7534 [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous (Revision 1431769) Result = SUCCESS jdcryans : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security-on-Hadoop-23 #11 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/11/)
          HBASE-7530 [replication] Work around HDFS-4380 else we get NPEs
          HBASE-7531 [replication] NPE in SequenceFileLogReader because
          ReplicationSource doesn't nullify the reader
          HBASE-7534 [replication] TestReplication.queueFailover can fail
          because HBaseTestingUtility.createMultiRegions is dangerous (Revision 1431769)

          Result = FAILURE
          jdcryans :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security-on-Hadoop-23 #11 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/11/ ) HBASE-7530 [replication] Work around HDFS-4380 else we get NPEs HBASE-7531 [replication] NPE in SequenceFileLogReader because ReplicationSource doesn't nullify the reader HBASE-7534 [replication] TestReplication.queueFailover can fail because HBaseTestingUtility.createMultiRegions is dangerous (Revision 1431769) Result = FAILURE jdcryans : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplication.java
          Hide
          Dave Latham added a comment -

          We've been running replication in production and saw this NPE happen on a few region servers over the weekend. Are there any recommended workarounds?

          Show
          Dave Latham added a comment - We've been running replication in production and saw this NPE happen on a few region servers over the weekend. Are there any recommended workarounds?

            People

            • Assignee:
              Jean-Daniel Cryans
              Reporter:
              Jean-Daniel Cryans
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development