Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-1651

GC removed WAL that master wasn't done with

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.0
    • Component/s: gc, master
    • Labels:
      None

      Description

      I have a master that's spinning trying to recover a walog that doesn't exist in hdfs. It looks like the GC cleaned it up. I was stopping and starting my cluster throughout this period, and there was at least a few minutes in which every service was talking SSL except the GC, so the GC couldn't receive thrift messages from other services, but John Vines says this shouldn't affect the GC's deletion behavior.

      Here are some relevant logs. Note that the master thinks its logSet includes that file straight through the time the GC removed it.

      GC:

      2013-08-09 11:58:14,835 [util.MetadataTableUtil] INFO : Returning logs [!!R<< hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 (1)] for extent !!R<<
      2013-08-09 11:58:14,852 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing WAL for offline server hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7
      2013-08-09 12:03:15,467 [util.MetadataTableUtil] INFO : Returning logs [!!R<< hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 (1)] for extent !!R<<
      

      Master:

      2013-08-09 11:57:45,235 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:45,238 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:45,286 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:45,324 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:45,939 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:45,942 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:45,975 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:55,612 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:55,679 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:55,739 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:55,764 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:55,784 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:56,031 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:56,046 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:58:56,051 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:59:56,057 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:00:56,062 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:01:56,066 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:02:56,071 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:08:56,103 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:09:56,108 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:10:56,113 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:11:56,118 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:13:19,883 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:14:19,887 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      <master was restarted here>
      2013-08-09 12:15:44,459 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:15:44,467 [recovery.RecoveryManager] DEBUG: Recovering hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 to hdfs://localhost:54310/otherAccumuloInstance/recovery/5a383792-c89b-41ed-bc22-0802e76638f7
      2013-08-09 12:15:44,472 [recovery.RecoveryManager] INFO : Starting recovery of hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 (in : 10s) created for localhost+9997, tablet !!R<< holds a reference
      2013-08-09 12:15:54,479 [recovery.RecoveryManager] DEBUG: Unable to initate log sort for hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7: java.io.FileNotFoundException: java.io.FileNotFoundException: File not found /otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7
      2013-08-09 12:16:44,487 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:16:44,488 [recovery.RecoveryManager] DEBUG: Recovering hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 to hdfs://localhost:54310/otherAccumuloInstance/recovery/5a383792-c89b-41ed-bc22-0802e76638f7
      2013-08-09 12:16:44,490 [recovery.RecoveryManager] INFO : Starting recovery of hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 (in : 20s) created for localhost+9997, tablet !!R<< holds a reference
      2013-08-09 12:17:04,494 [recovery.RecoveryManager] DEBUG: Unable to initate log sort for hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7: java.io.FileNotFoundException: java.io.FileNotFoundException: File not found /otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7
      <repeating ad infinitum>
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ecn Eric Newton
                Reporter:
                mberman Michael Berman
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: