Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-167

DFSClient continues to retry indefinitely

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 0.20.1, 0.21.0
    • hdfs-client
    • None
    • Reviewed

    Description

      I encountered a bug when trying to upload data using the Hadoop DFS Client.
      After receiving a NotReplicatedYetException, the DFSClient will normally retry its upload up to some limited number of times. In this case, I found that this retry loop continued indefinitely, to the point that the number of tries remaining was negative:
      2009-03-25 16:20:02 [INFO]
      2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: Waiting for replication for 21 seconds
      2009-03-25 16:20:03 [INFO] 09/03/25 16:20:02 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /apollo/env/SummaryMySQL/var/logstore/fiorello_logs_2009
      0325_us/logs_20090325_us_13 retries left -1

      The stack trace for the failure that's retrying is:
      2009-03-25 16:20:02 [INFO] 09/03/25 16:20:02 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.NotReplicated
      YetException: Not replicated yet:<filename>
      2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1266)
      2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
      2009-03-25 16:20:02 [INFO] at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
      2009-03-25 16:20:02 [INFO] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      2009-03-25 16:20:02 [INFO] at java.lang.reflect.Method.invoke(Method.java:597)
      2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
      2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
      2009-03-25 16:20:02 [INFO]
      2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.Client.call(Client.java:697)
      2009-03-25 16:20:02 [INFO] at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
      2009-03-25 16:20:02 [INFO] at $Proxy0.addBlock(Unknown Source)
      2009-03-25 16:20:02 [INFO] at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
      2009-03-25 16:20:02 [INFO] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      2009-03-25 16:20:02 [INFO] at java.lang.reflect.Method.invoke(Method.java:597)
      2009-03-25 16:20:02 [INFO] at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
      2009-03-25 16:20:02 [INFO] at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
      2009-03-25 16:20:02 [INFO] at $Proxy0.addBlock(Unknown Source)
      2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
      2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
      2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
      2009-03-25 16:20:02 [INFO] at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

      Fixes logical error in DFSClient::DFSOutputStream::DataStreamer::locateFollowingBlock that caused infinite retries on write. Modified DFSClient constructor to allow unit testing of locateFollowingBlock and added unit tests.

      Attachments

        1. hdfs-167-4.patch
          12 kB
          Bill Zeller
        2. hdfs-167-5.patch
          35 kB
          Bill Zeller
        3. hdfs-167-6.patch
          11 kB
          Bill Zeller
        4. hdfs-167-for-20-1.patch
          11 kB
          Bill Zeller

        Issue Links

          Activity

            People

              zeller Bill Zeller
              dwollen Derek Wollenstein
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: