Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6728

Give fetchers hint when ShuffleHandler rejects a shuffling connection

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.9.0, 3.0.0-alpha2
    • mrv2
    • None
    • Reviewed

    Description

      If # of open shuffle connection to a node goes over the max, ShuffleHandler closes the connection immediately without giving fetchers any hint of the reason, which causes fetchers to fail due to exceptions

      java.net.SocketException: Unexpected end of file from server
      at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772)
      at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
      at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
      at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
      at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
      at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)

      OR

      java.net.SocketException: Connection reset
      at java.net.SocketInputStream.read(SocketInputStream.java:196)
      at java.net.SocketInputStream.read(SocketInputStream.java:122)
      at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
      at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
      at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
      at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
      at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
      at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
      at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
      at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
      at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
      at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java

      Such failures are counted as fetcher failures

      Attachments

        1. mapreduce6728.001.patch
          10 kB
          Haibo Chen
        2. mapreduce6728.002.patch
          15 kB
          Haibo Chen
        3. mapreduce6728.003.patch
          15 kB
          Haibo Chen
        4. mapreduce6728.004.patch
          15 kB
          Haibo Chen
        5. mapreduce6728.005.patch
          16 kB
          Haibo Chen
        6. mapreduce6728.006.patch
          16 kB
          Haibo Chen
        7. mapreduce6728.branch-2.8.patch
          16 kB
          Haibo Chen
        8. mapreduce6728.prelim.patch
          10 kB
          Haibo Chen
        9. MAPREDUCE-6728-branch-2.8.06.patch
          16 kB
          Haibo Chen

        Activity

          People

            haibochen Haibo Chen
            haibochen Haibo Chen
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: