Description
Running the Shard module of Randomwalk.
Noticed that some driver code around the test had timed out:
14 02:44:21,839 [shard.Merge] DEBUG: merging ST_index_hostname_24487_1413254628126 14 02:46:22,129 [impl.ThriftTransportPool] WARN : Thread "org.apache.accumulo.test.randomwalk.Framework" stuck on IO to master:9999 (0) for at least 120050 ms 14 02:49:21,807 [randomwalk.Module] WARN : Node org.apache.accumulo.test.randomwalk.shard.Merge has been running for 300.003 seconds. You may want to look into it.
A few seconds later, the master complains about the tabletserver failing to unload some tablets for the table we're merging (presumably to do the merge), and the TabletServer hits an NPE trying to run a MinC (line numbers screwed up, so omitting the stack traces). We start down trying to get a new filename
FileRef newMapfileLocation = getNextMapFilename(mergeFile == null ? "F" : "M");
Which ultimately threw an AccessControlException from Hadoop because it tried to write to the root of HDFS instead of the provided directory (issue #1).
This also cause an NPE in the finally block for the MinorCompactionTask on (issue #2)
minorCompaction.data("numEntries", Long.toString(this.stats.getNumEntries()));
Since all of this (+12hrs), the client is still blocked on the merge to complete, the tserver is stuck in a loop trying to unload the tablets
2014-10-14 22:53:48,761 [tserver.Tablet] DEBUG: initiateClose(saveState=true queueMinC=false disableWrites=false) 2w;00001a;000018 2014-10-14 22:53:48,761 [tserver.TabletServer] DEBUG: Failed to unload tablet 2w;00001a;000018... it was alread closing or closed : Tablet 2w;00001a;000018 already closing
the master is stuck in a loop waiting for those tablets to go offline (I think)
2014-10-14 22:54:40,680 [state.MergeStats] INFO : Computing next merge state for 2w<< which is presently WAITING_FOR_OFFLINE isDelete : false 2014-10-14 22:54:40,680 [state.MergeStats] INFO : 21 tablets are chopped, 0 are offline 2w<< 2014-10-14 22:54:40,680 [state.MergeStats] INFO : Waiting for 0 unassigned tablets to be 21 2w<< 2014-10-14 22:54:40,680 [master.Master] DEBUG: [Normal Tablets] sleeping for 60.00 seconds 2014-10-14 22:54:40,684 [master.Master] DEBUG: Finished gathering information from 1 servers in 0.00 seconds 2014-10-14 22:54:40,685 [balancer.DefaultLoadBalancer] DEBUG: balance ended with 0 migrations 2014-10-14 22:54:40,685 [balancer.DefaultLoadBalancer] DEBUG: balance ended with 0 migrations 2014-10-14 22:54:40,685 [balancer.DefaultLoadBalancer] DEBUG: balance ended with 0 migrations 2014-10-14 22:54:40,686 [balancer.DefaultLoadBalancer] DEBUG: balance ended with 0 migrations 2014-10-14 22:54:40,686 [balancer.DefaultLoadBalancer] DEBUG: balance ended with 0 migrations 2014-10-14 22:54:40,686 [balancer.DefaultLoadBalancer] DEBUG: balance ended with 0 migrations 2014-10-14 22:54:40,804 [state.ZooTabletStateStore] DEBUG: Returning root tablet state: +r<<@(null,hostname:9997[1490be48ae20006],hostname:9997[1490be48ae20006]) 2014-10-14 22:54:40,804 [master.Master] DEBUG: [Root Table]: scan time 0.00 seconds 2014-10-14 22:54:40,804 [master.Master] DEBUG: [Root Table] sleeping for 60.00 seconds 2014-10-14 22:54:40,830 [master.Master] DEBUG: [Metadata Tablets]: scan time 0.04 seconds 2014-10-14 22:54:40,830 [master.Master] DEBUG: [Metadata Tablets] sleeping for 60.00 seconds 2014-10-14 22:54:40,830 [master.Master] DEBUG: mergeInfo overlaps: 2w;000002< true 2014-10-14 22:54:40,831 [master.Master] DEBUG: mergeInfo overlaps: 2w;000004;000002 true 2014-10-14 22:54:40,831 [master.Master] DEBUG: mergeInfo overlaps: 2w;000006;000004 true 2014-10-14 22:54:40,832 [master.Master] DEBUG: mergeInfo overlaps: 2w;000008;000006 true 2014-10-14 22:54:40,832 [master.Master] DEBUG: mergeInfo overlaps: 2w;00000a;000008 true 2014-10-14 22:54:40,833 [master.Master] DEBUG: mergeInfo overlaps: 2w;00000c;00000a true 2014-10-14 22:54:40,833 [master.Master] DEBUG: mergeInfo overlaps: 2w;00000e;00000c true 2014-10-14 22:54:40,834 [master.Master] DEBUG: mergeInfo overlaps: 2w;000010;00000e true 2014-10-14 22:54:40,834 [master.Master] DEBUG: mergeInfo overlaps: 2w;000012;000010 true 2014-10-14 22:54:40,835 [master.Master] DEBUG: mergeInfo overlaps: 2w;000014;000012 true 2014-10-14 22:54:40,835 [master.Master] DEBUG: mergeInfo overlaps: 2w;000016;000014 true 2014-10-14 22:54:40,835 [master.Master] DEBUG: mergeInfo overlaps: 2w;000018;000016 true 2014-10-14 22:54:40,836 [master.Master] DEBUG: mergeInfo overlaps: 2w;00001a;000018 true 2014-10-14 22:54:40,836 [master.Master] DEBUG: mergeInfo overlaps: 2w;00001c;00001a true 2014-10-14 22:54:40,836 [master.Master] DEBUG: mergeInfo overlaps: 2w;00001e;00001c true 2014-10-14 22:54:40,837 [master.Master] DEBUG: mergeInfo overlaps: 2w;000020;00001e true 2014-10-14 22:54:40,837 [master.Master] DEBUG: mergeInfo overlaps: 2w;000022;000020 true 2014-10-14 22:54:40,837 [master.Master] DEBUG: mergeInfo overlaps: 2w;000024;000022 true 2014-10-14 22:54:40,838 [master.Master] DEBUG: mergeInfo overlaps: 2w;000026;000024 true 2014-10-14 22:54:40,838 [master.Master] DEBUG: mergeInfo overlaps: 2w;000028;000026 true 2014-10-14 22:54:40,839 [master.Master] DEBUG: mergeInfo overlaps: 2w<;000028 true 2014-10-14 22:54:40,839 [master.Master] DEBUG: [Normal Tablets]: scan time 0.04 seconds 2014-10-14 22:54:40,839 [master.EventCoordinator] INFO : [Normal Tablets]: 21 tablets unloaded
And the fate op is still locked (combined issue #3)
txid: 3a638f9642050e1a status: IN_PROGRESS op: TableRangeOp locked: [W:2w, R:+default] locking: [] top: TableRangeOpWait
Attachments
Issue Links
- is related to
-
ACCUMULO-3215 Import tries to use default DFS directory instead of configured
- Resolved