Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6728

Give fetchers hint when ShuffleHandler rejects a shuffling connection

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.0, 3.0.0-alpha2
    • Component/s: mrv2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      If # of open shuffle connection to a node goes over the max, ShuffleHandler closes the connection immediately without giving fetchers any hint of the reason, which causes fetchers to fail due to exceptions

      java.net.SocketException: Unexpected end of file from server
      at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772)
      at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
      at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
      at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
      at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
      at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)

      OR

      java.net.SocketException: Connection reset
      at java.net.SocketInputStream.read(SocketInputStream.java:196)
      at java.net.SocketInputStream.read(SocketInputStream.java:122)
      at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
      at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
      at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
      at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
      at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
      at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
      at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
      at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
      at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java

      Such failures are counted as fetcher failures

        Attachments

        1. MAPREDUCE-6728-branch-2.8.06.patch
          16 kB
          Haibo Chen
        2. mapreduce6728.prelim.patch
          10 kB
          Haibo Chen
        3. mapreduce6728.branch-2.8.patch
          16 kB
          Haibo Chen
        4. mapreduce6728.006.patch
          16 kB
          Haibo Chen
        5. mapreduce6728.005.patch
          16 kB
          Haibo Chen
        6. mapreduce6728.004.patch
          15 kB
          Haibo Chen
        7. mapreduce6728.003.patch
          15 kB
          Haibo Chen
        8. mapreduce6728.002.patch
          15 kB
          Haibo Chen
        9. mapreduce6728.001.patch
          10 kB
          Haibo Chen

          Activity

            People

            • Assignee:
              haibochen Haibo Chen
              Reporter:
              haibochen Haibo Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: