Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-1651

GC removed WAL that master wasn't done with

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.6.0
    • gc, master
    • None

    Description

      I have a master that's spinning trying to recover a walog that doesn't exist in hdfs. It looks like the GC cleaned it up. I was stopping and starting my cluster throughout this period, and there was at least a few minutes in which every service was talking SSL except the GC, so the GC couldn't receive thrift messages from other services, but vines says this shouldn't affect the GC's deletion behavior.

      Here are some relevant logs. Note that the master thinks its logSet includes that file straight through the time the GC removed it.

      GC:

      2013-08-09 11:58:14,835 [util.MetadataTableUtil] INFO : Returning logs [!!R<< hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 (1)] for extent !!R<<
      2013-08-09 11:58:14,852 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing WAL for offline server hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7
      2013-08-09 12:03:15,467 [util.MetadataTableUtil] INFO : Returning logs [!!R<< hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 (1)] for extent !!R<<
      

      Master:

      2013-08-09 11:57:45,235 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:45,238 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:45,286 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:45,324 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:45,939 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:45,942 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:45,975 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:55,612 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:55,679 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:55,739 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:55,764 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:55,784 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:56,031 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:57:56,046 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:58:56,051 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 11:59:56,057 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:00:56,062 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:01:56,066 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:02:56,071 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:08:56,103 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:09:56,108 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:10:56,113 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:11:56,118 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:13:19,883 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:14:19,887 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      <master was restarted here>
      2013-08-09 12:15:44,459 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:15:44,467 [recovery.RecoveryManager] DEBUG: Recovering hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 to hdfs://localhost:54310/otherAccumuloInstance/recovery/5a383792-c89b-41ed-bc22-0802e76638f7
      2013-08-09 12:15:44,472 [recovery.RecoveryManager] INFO : Starting recovery of hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 (in : 10s) created for localhost+9997, tablet !!R<< holds a reference
      2013-08-09 12:15:54,479 [recovery.RecoveryManager] DEBUG: Unable to initate log sort for hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7: java.io.FileNotFoundException: java.io.FileNotFoundException: File not found /otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7
      2013-08-09 12:16:44,487 [state.ZooTabletStateStore] DEBUG: root tablet logSet [localhost+9997/hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7]
      2013-08-09 12:16:44,488 [recovery.RecoveryManager] DEBUG: Recovering hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 to hdfs://localhost:54310/otherAccumuloInstance/recovery/5a383792-c89b-41ed-bc22-0802e76638f7
      2013-08-09 12:16:44,490 [recovery.RecoveryManager] INFO : Starting recovery of hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7 (in : 20s) created for localhost+9997, tablet !!R<< holds a reference
      2013-08-09 12:17:04,494 [recovery.RecoveryManager] DEBUG: Unable to initate log sort for hdfs://localhost:54310/otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7: java.io.FileNotFoundException: java.io.FileNotFoundException: File not found /otherAccumuloInstance/wal/localhost+9997/5a383792-c89b-41ed-bc22-0802e76638f7
      <repeating ad infinitum>
      

      Attachments

        Issue Links

          Activity

            People

              ecn Eric C. Newton
              mberman Michael Berman
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: