Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
1.4.3, 1.5.0
-
None
-
None
-
hadoop-1.0.1, hadoop-1.1.2 / accumulo 1.4.3
Description
Attempting to test ACCUMULO-575 with the following test framework:
Test bench-
1 node running hadoop namenode and 1 datanode
slave noderunning 1 datanode and accumulo stack, with 8GB in memory map
Running patched version of accumulo with the following aptch to provide helper debug
Index: server/src/main/java/org/apache/accumulo/server/tabletserver/Compactor.java =================================================================== --- server/src/main/java/org/apache/accumulo/server/tabletserver/Compactor.java (revision 1429057) +++ server/src/main/java/org/apache/accumulo/server/tabletserver/Compactor.java (working copy) @@ -81,6 +81,7 @@ private FileSystem fs; protected KeyExtent extent; private List<IteratorSetting> iterators; + protected boolean minor= false; Compactor(Configuration conf, FileSystem fs, Map<String,DataFileValue> files, InMemoryMap imm, String outputFile, boolean propogateDeletes, TableConfiguration acuTableConf, KeyExtent extent, CompactionEnv env, List<IteratorSetting> iterators) { @@ -158,7 +159,7 @@ log.error("Verification of successful compaction fails!!! " + extent + " " + outputFile, ex); throw ex; } - + log.info("Just completed minor? " + minor + " for table " + extent.getTableId()); log.debug(String.format("Compaction %s %,d read | %,d written | %,6d entries/sec | %6.3f secs", extent, majCStats.getEntriesRead(), majCStats.getEntriesWritten(), (int) (majCStats.getEntriesRead() / ((t2 - t1) / 1000.0)), (t2 - t1) / 1000.0)); Index: server/src/main/java/org/apache/accumulo/server/tabletserver/MinorCompactor.java =================================================================== --- server/src/main/java/org/apache/accumulo/server/tabletserver/MinorCompactor.java (revision 1429057) +++ server/src/main/java/org/apache/accumulo/server/tabletserver/MinorCompactor.java (working copy) @@ -88,6 +88,7 @@ do { try { + this.minor = true; CompactionStats ret = super.call(); // log.debug(String.format("MinC %,d recs in | %,d recs out | %,d recs/sec | %6.3f secs | %,d bytes ",map.size(), entriesCompacted,
I stood up a new instance, create a table named test. Ran the following -
tail -f accumulo-1.5.0-SNAPSHOT/logs/tserver_slave.debug.log | ./ifttt.sh
where ifttt.sh is
#!/bin/sh dnpid=`jps -m | grep DataNode | awk '{print $1}'` while [ -z "" ]; do if [ -e $1 ] ;then read str; else str=$1;fi if [ -n "`echo $str | grep "Just completed minor? true for table 2"`" ]; then echo "I'm gonna kill datanode, pid $dnpid" kill -9 $dnpid fi done
Then I ran thefollowing
accumulo org.apache.accumulo.server.test.TestIngest --table test --rows 65536 --cols 100 --size 8192 -z 172.16.101.220:2181 --batchMemory 100000000 --batchThreads 10
Eventually the memory map filled, minor compaction happened, local datanode was killed and things died. Logs filled with-
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /accumulo/wal/172.16.101.219+9997/08b9f1b4-26d5-4b07-a260-3334c2013576 could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1556) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
and
Unexpected error writing to log, retrying attempt 1 java.io.IOException: DFSOutputStream is closed at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3666) at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97) at org.apache.accumulo.server.tabletserver.log.DfsLogger.defineTablet(DfsLogger.java:295) at org.apache.accumulo.server.tabletserver.log.TabletServerLogger$4.write(TabletServerLogger.java:333) at org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:273) at org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229) at org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330) at org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:254) at org.apache.accumulo.server.tabletserver.log.TabletServerLogger.write(TabletServerLogger.java:229) at org.apache.accumulo.server.tabletserver.log.TabletServerLogger.defineTablet(TabletServerLogger.java:330) ... repeats...
.
Bringing the datanode back up did NOT fix it, either.
UPDATE: reran and never killed datanode and it still died. So this isn't an issue with my datanode killing, it's something with hadop 1.0.1 and the new rite ahead logs.
Attachments
Attachments
Issue Links
- blocks
-
ACCUMULO-575 Potential data loss when datanode fails immediately after minor compaction
- Resolved