Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-3182

Empty or partial WAL header blocks successful recovery

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.1
    • 1.6.2, 1.7.0
    • tserver
    • None

    Description

      Haven't ever seen this one before. A replication IT failed – looking into it, it was because the tserver that came up (after killing the original) failed to complete recovery. The below happened a few times before the test ultimately timed out.

      2014-09-29 04:46:10,259 [zookeeper.DistributedWorkQueue] DEBUG: Looking for work in /accumulo/f98e79c4-9dcd-4fb0-8ec9-5804f0818839/recovery
      2014-09-29 04:46:10,340 [zookeeper.DistributedWorkQueue] DEBUG: got lock for af53bf1e-c293-463b-b4de-5efdb8b34962
      2014-09-29 04:46:10,341 [log.LogSorter] DEBUG: Sorting file:/.../test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/wal/juno+49195/af53bf1e-c293-463b-b4de-5efdb8b34962 to file:/.../test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/recovery/af53bf1e-c293-463b-b4de-5efdb8b34962 using sortId af53bf1e-c293-463b-b4de-5efdb8b34962
      2014-09-29 04:46:10,341 [log.LogSorter] INFO : Copying file:/var/lib/jenkins/home/jobs/Accumulo-Master-Integration-Tests/workspace/test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/wal/juno+49195/af53bf1e-c293-463b-b4de-5efdb8b34962 to file:/.../test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/recovery/af53bf1e-c293-463b-b4de-5efdb8b34962
      2014-09-29 04:46:10,345 [log.LogSorter] ERROR: java.io.EOFException
      java.io.EOFException
      	at java.io.DataInputStream.readFully(DataInputStream.java:197)
      	at java.io.DataInputStream.readFully(DataInputStream.java:169)
      	at org.apache.accumulo.tserver.log.DfsLogger.readHeaderAndReturnStream(DfsLogger.java:282)
      	at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.sort(LogSorter.java:113)
      	at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.process(LogSorter.java:93)
      	at org.apache.accumulo.server.zookeeper.DistributedWorkQueue$1.run(DistributedWorkQueue.java:105)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
      	at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
      	at java.lang.Thread.run(Thread.java:745)
      2014-09-29 04:46:10,346 [log.LogSorter] ERROR: Error during cleanup sort/copy af53bf1e-c293-463b-b4de-5efdb8b34962
      java.lang.NullPointerException
      	at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.close(LogSorter.java:183)
      	at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.sort(LogSorter.java:151)
      	at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.process(LogSorter.java:93)
      	at org.apache.accumulo.server.zookeeper.DistributedWorkQueue$1.run(DistributedWorkQueue.java:105)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
      	at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
      	at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        Issue Links

          Activity

            People

              elserj Josh Elser
              elserj Josh Elser
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m