Description
I was running test/system/upgrade_test.sh dirty and the test hung. Upon inspection, the wals from 1.5 were deleted before all tablets were recovered.
Some tablets from 1.5 recovered fine.
2013-10-29 20:29:26,475 [log.SortedLogRecovery] INFO : Recovery complete for !!R<< using hdfs://nnhost:6093/rktl/accumulo-upt/recovery/754f171b-c260-42dd-b17e-bd15064608c7
Then the GC kicked in and deleted files before tablets were finished recovering.
2013-10-29 20:29:30,421 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing WAL for offline server hdfs://nnhost:6093/rktl/accumulo-upt/wal/127.0.0.1+9997/754f171b-c260-42dd-b17e-bd15064608c7 2013-10-29 20:29:30,428 [gc.GarbageCollectWriteAheadLogs] DEBUG: Removing sorted WAL hdfs://nnhost:6093/rktl/accumulo-upt/recovery/754f171b-c260-42dd-b17e-bd15064608c7
Tablet failed to recover.
2013-10-29 20:29:30,858 [tabletserver.TabletServer] WARN : exception trying to assign tablet 1<;row_0000180000 /default_tablet java.lang.RuntimeException: java.io.IOException: Unable to find recovery files for extent 1<;row_0000180000 logEntry: 1<; 754f171b-c260-42dd-b17e-bd15064608c7 (19) at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1398) at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1233) at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1088) at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1076)
I had set my gc delay to 30 secs while testing another issue and thats why I ran into this issue.
Looking at the code, I do not think its properly converting relative paths from 1.5 to absolute paths. I think the code should convert everything to relative paths (just UUIDs) to avoid problems caused by differing configurations.