Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26843

When encounters EOF exception, replication log will not be dequeued

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.10
    • 3.0.0-beta-2
    • Replication
    • None

    Description

      We met this problem on our production clusters, and in the pre-commit tests, it also occured. 

      We can see this test logs,

       

      2022-03-14T15:31:16,225 INFO  [Time-limited test] wal.AbstractFSWAL(840): Rolled WAL /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-4219/yetus-jdk8-hadoop3-check/src/hbase-server/target/test-data/b3808e70-5b17-10bb-3bd3-fdd825fd8fab/WALs/testSizeOfLogQueue/testSizeOfLogQueue.1647271876165 with entries=1, filesize=210 B; new WAL /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-4219/yetus-jdk8-hadoop3-check/src/hbase-server/target/test-data/b3808e70-5b17-10bb-3bd3-fdd825fd8fab/WALs/testSizeOfLogQueue/testSizeOfLogQueue.1647271876191
      2022-03-14T15:31:16,225 DEBUG [Time-limited test] wal.AbstractFSWAL(927): Create new AsyncFSWAL writer with pipeline: [DatanodeInfoWithStorage[127.0.0.1:34961,DS-77496ab4-d8f2-4bee-af67-6f1c3e08de20,DISK], DatanodeInfoWithStorage[127.0.0.1:38455,DS-50a46a7f-6202-4177-a91f-7da535dbf07f,DISK], DatanodeInfoWithStorage[127.0.0.1:40111,DS-d41b66a9-f9d5-4a57-9969-ac93cae6c9ac,DISK]]
      2022-03-14T15:31:16,338 WARN  [Close-WAL-Writer-0] asyncfs.FanOutOneBlockAsyncDFSOutputHelper(593): complete file /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-4219/yetus-jdk8-hadoop3-check/src/hbase-server/target/test-data/b3808e70-5b17-10bb-3bd3-fdd825fd8fab/WALs/testSizeOfLogQueue/testSizeOfLogQueue.1647271876165 not finished, retry = 0
      2022-03-14T15:31:16,357 DEBUG [Time-limited test] wal.ProtobufLogReader(456): Encountered a malformed edit, seeking back to last good position in file, from 211 to 210
      java.io.EOFException: Partial PB while reading WAL, probably an unexpected EOF, ignoring. current offset=211
      	at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:399) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:105) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:93) ~[classes/:?]
      	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:264) ~[classes/:?]
      	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:178) ~[classes/:?]
      	at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:103) ~[classes/:?]
      	at org.apache.hadoop.hbase.replication.regionserver.TestBasicWALEntryStream.testSizeOfLogQueue(TestBasicWALEntryStream.java:698) ~[test-classes/:?]
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_282]
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_282]
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_282]
      	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_282]
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) ~[junit-4.13.2.jar:4.13.2]
      	at org.apache.hadoop.hbase.SystemExitRule$1.evaluate(SystemExitRule.java:38) ~[hbase-common-3.0.0-alpha-3-SNAPSHOT-tests.jar:3.0.0-alpha-3-SNAPSHOT]
      	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) ~[junit-4.13.2.jar:4.13.2]
      	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) ~[junit-4.13.2.jar:4.13.2]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_282]
      	at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_282]
      2022-03-14T15:31:16,511 DEBUG [Time-limited test] wal.AbstractFSWAL(1050): Moved 2 WAL file(s) to /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-4219/yetus-jdk8-hadoop3-check/src/hbase-server/target/test-data/b3808e70-5b17-10bb-3bd3-fdd825fd8fab/oldWALs
      2022-03-14T15:31:16,511 INFO  [Time-limited test] wal.AbstractFSWAL(1053): Closed WAL: AsyncFSWAL testSizeOfLogQueue:(num 1647271876191)
       

      And this is the failure info,

       

       

      Failedprecommit checks / yetus jdk8 Hadoop3 checks / org.apache.hadoop.hbase.replication.regionserver.TestBasicWALEntryStreamFSHLog.testSizeOfLogQueueFailing for the past 1 build (Since #1 )运行时间:0.41 secError Messageexpected:<1> but was:<2>Stacktracejava.lang.AssertionError: expected:<1> but was:<2>
      	at org.junit.Assert.fail(Assert.java:89)
      	at org.junit.Assert.failNotEquals(Assert.java:835)
      	at org.junit.Assert.assertEquals(Assert.java:647)
      	at org.junit.Assert.assertEquals(Assert.java:633)
      	at org.apache.hadoop.hbase.replication.regionserver.TestBasicWALEntryStream.testSizeOfLogQueue(TestBasicWALEntryStream.java:701)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
      	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
      	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
      	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
      	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) 

      The link is https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4219/1/testReport/org.apache.hadoop.hbase.replication.regionserver/TestBasicWALEntryStreamFSHLog/precommit_checks___yetus_jdk8_Hadoop3_checks___testSizeOfLogQueue/

       

       

      I can not figure out what the problem is yet, but we can change the UT to find more info.

       

      Attachments

        Issue Links

          Activity

            People

              Xiaolin Ha Xiaolin Ha
              Xiaolin Ha Xiaolin Ha
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: