Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
I encountered a bug when trying to upload data using the Hadoop DFS Client.
After receiving a NotReplicatedYetException, the DFSClient will normally retry its upload up to some limited number of times. In this case, I found that this retry loop continued indefinitely, to the point that the number of tries remaining was negative:
2009-03-25 16:20:02 [INFO]
2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for replication for 21 seconds
2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
0325_us/logs_20090325_us_13 retries left -1
The stack trace for the failure that's retrying is:
2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.NotReplicated
YetException: Not replicated yet:<filename>
2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
2009-03-25 16:20:02 [INFO] at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
2009-03-25 16:20:02 [INFO] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
2009-03-25 16:20:02 [INFO] at java.lang.reflect.Method.invoke(Method.java:597)
2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
2009-03-25 16:20:02 [INFO]
2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.Client.call(Client.java:697)
2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
2009-03-25 16:20:02 [INFO] at $Proxy0.addBlock(Unknown Source)
2009-03-25 16:20:02 [INFO] at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
2009-03-25 16:20:02 [INFO] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
2009-03-25 16:20:02 [INFO] at java.lang.reflect.Method.invoke(Method.java:597)
2009-03-25 16:20:02 [INFO] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
2009-03-25 16:20:02 [INFO] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
2009-03-25 16:20:02 [INFO] at $Proxy0.addBlock(Unknown Source)
2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
Fixes logical error in DFSClient::DFSOutputStream::DataStreamer::locateFollowingBlock that caused infinite retries on write. Modified DFSClient constructor to allow unit testing of locateFollowingBlock and added unit tests.