Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Duplicate
-
0.12.0
-
None
-
None
Description
The Sort benchmark completes successfully for me on the latest trunk (0.12.1 candidate) with speculation turned on. Validation of the Sort benchmark output, however, is failing. I see one sort output file (part-00375) that is way smaller than all the rest. In fact, it is exactly 1 block long.
dfs ls output:
...
/user/hadoopqa/sortBenchmark100/output/part-00373 <r 3> 2971688212
/user/hadoopqa/sortBenchmark100/output/part-00374 <r 3> 2973451660
/user/hadoopqa/sortBenchmark100/output/part-00375 <r 3> 134217728
/user/hadoopqa/sortBenchmark100/output/part-00376 <r 3> 2972933208
/user/hadoopqa/sortBenchmark100/output/part-00377 <r 3> 2972309956
...
During the Sort Benchmark, I see 9 AlreadyBeingCreatedExceptions in the NameNode log for this file (and more of these exceptions for other files too). I also include here the 1 PendingReplicationMonitor WARN message from the NameNode log in case it's relevant:
...
2007-03-08 21:56:31,747 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor timed out block blk_-849195508701590166
...
2007-03-08 22:04:35,471 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.startFile: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
2007-03-08 22:04:35,476 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020 call error: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:701)
at org.apache.hadoop.dfs.NameNode.create(NameNode.java:283)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
...
During sort validation, I get this exception in the JobTracker log:
2007-03-08 22:51:32,017 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0003_m_001379_0: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:57)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:91)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1525)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1436)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1482)
at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:72)
at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
I also saw this in the DataNode log during sort validation, but it could be unrelated:
2007-03-09 01:04:41,323 WARN org.apache.hadoop.dfs.DataNode: java.io.IOException: Unexpected error trying to delete block blk_-5047673597270588432. Block not found in blockMap.
at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:596)
at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:429)
at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1053)
at java.lang.Thread.run(Thread.java:619)
Since the default for 0.12.1 will be for speculative execution to be turned off, I am assigning this to 0.13.0