Hadoop Common
  1. Hadoop Common
  2. HADOOP-3124

DFS data node should not use hard coded 10 minutes as write timeout.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.17.0
    • Fix Version/s: 0.17.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Makes DataNode socket write timeout configurable. User impact : none.

      Description

      This problem happens in 0.17 trunk

      I saw reducers waited 10 minutes for writing data to dfs and got timeout.
      The client retries again and timeouted after another 19 minutes.

      After looking into the code, it seems that the dfs data node uses 10 minutes as timeout for wtiting data into the data node pipeline.
      I thing we have three issues:

      1. The 10 minutes timeout value is too big for writing a chunk of data (64K) through the data node pipeline.
      2. The timeout value should not be hard coded.
      3. Different datanodes in a pipeline should use different timeout values for writing to the downstream.
      A reasonable one maybe (20 secs * numOfDataNodesInTheDownStreamPipe).
      For example, if the replication factor is 3, the client uses 60 secs, the first data node use 40 secs, the second datanode use 20secs.

      1. HADOOP-3124.patch
        7 kB
        Raghu Angadi
      2. HADOOP-3124.patch
        9 kB
        Raghu Angadi

        Issue Links

          Activity

          Runping Qi created issue -
          Runping Qi made changes -
          Field Original Value New Value
          Link This issue relates to HADOOP-3132 [ HADOOP-3132 ]
          Runping Qi made changes -
          Component/s dfs [ 12310710 ]
          Description
          This problem happens in 0.17 trunk

          I saw reducers waited 10 minutes for writing data to dfs and got timeout.
          The client retries again and timeouted after another 19 minutes.

          After looking into the code, it seems that the dfs data node uses 10 minutes as timeout for wtiting data into the data node pipeline.
          I thing we have three issues:

          1. The 10 minutes timeout value is too big for writing a chunk of data (64K) through the data node pipeline.
          2. The timeout value should not be hard coded.
          3. Different datanodes in a pipeline should use different timeout values for writing to the downstream.
          A reasonable one maybe (20 secs * numOfDataNodesInTheDownStreamPipe).
          For example, if the replication factor is 3, the client uses 60 secs, the first data node use 40 secs, the second datanode use 20secs.
          This problem happens in 0.17 trunk

          I saw reducers waited 10 minutes for writing data to dfs and got timeout.
          The client retries again and timeouted after another 19 minutes.

          After looking into the code, it seems that the dfs data node uses 10 minutes as timeout for wtiting data into the data node pipeline.
          I thing we have three issues:

          1. The 10 minutes timeout value is too big for writing a chunk of data (64K) through the data node pipeline.
          2. The timeout value should not be hard coded.
          3. Different datanodes in a pipeline should use different timeout values for writing to the downstream.
          A reasonable one maybe (20 secs * numOfDataNodesInTheDownStreamPipe).
          For example, if the replication factor is 3, the client uses 60 secs, the first data node use 40 secs, the second datanode use 20secs.
          Raghu Angadi made changes -
          Assignee Raghu Angadi [ rangadi ]
          Raghu Angadi made changes -
          Attachment HADOOP-3124.patch [ 12379622 ]
          Raghu Angadi made changes -
          Affects Version/s 0.17.0 [ 12312913 ]
          Raghu Angadi made changes -
          Link This issue incorporates HADOOP-3051 [ HADOOP-3051 ]
          Raghu Angadi made changes -
          Fix Version/s 0.18.0 [ 12312972 ]
          Raghu Angadi made changes -
          Attachment HADOOP-3124.patch [ 12380122 ]
          Raghu Angadi made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Raghu Angadi made changes -
          Fix Version/s 0.17.0 [ 12312913 ]
          Fix Version/s 0.18.0 [ 12312972 ]
          Raghu Angadi made changes -
          Resolution Fixed [ 1 ]
          Hadoop Flags [Reviewed]
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Raghu Angadi made changes -
          Environment Makes DataNode socket write timeout configurable. User impact : none.
          Raghu Angadi made changes -
          Release Note Makes DataNode socket write timeout configurable. User impact : none.
          Environment Makes DataNode socket write timeout configurable. User impact : none.
          Nigel Daley made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Owen O'Malley made changes -
          Component/s dfs [ 12310710 ]

            People

            • Assignee:
              Raghu Angadi
              Reporter:
              Runping Qi
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development