Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16692

LLAP: Keep alive connection in shuffle handler should not be closed until entire data is flushed out

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 3.0.0
    • llap
    • None

    Description

      In corner cases with keep-alive enabled, it is possible that the headers are written out in the response and downstream was able to read the headers.

      But possible that the mapOutput construction took a lot longer time (due to disk or any other issue) in server side. In the mean time, keep alive timeout can kick in and close the connection from server side. In such cases, there is a possibility that downstream can get "connection reset". Ideally keep alive should kick in only after flushing entire response downstream.

      e.g error msg in client side

      java.net.SocketException: Connection reset
              at java.net.SocketInputStream.read(SocketInputStream.java:209) ~[?:1.8.0_112]
              at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_112]
              at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[?:1.8.0_112]
              at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[?:1.8.0_112]
              at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0_112]
              at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) ~[?:1.8.0_112]
              at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) ~[?:1.8.0_112]
              at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:675) ~[?:1.8.0_112]
              at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569) ~[?:1.8.0_112]
              at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474) ~[?:1.8.0_112]
              at org.apache.tez.http.HttpConnection.getInputStream(HttpConnection.java:260) ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11]
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.setupConnection(Fetcher.java:460) ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11]
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:492) ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11]
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:417) ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11]
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:215) ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11]
              at org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:73) ~[tez-runtime-library-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11]
              at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) ~[tez-common-0.8.4.2.6.1.0-11.jar:0.8.4.2.6.1.0-11]
              at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_112]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_112]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_112]
              at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
      

      This corner case handling was not pulled in earlier from MR handler fixes.

      Attachments

        1. HIVE-16692.addendum.patch
          1 kB
          Rajesh Balamohan
        2. HIVE-16692.1.patch
          3 kB
          Rajesh Balamohan
        3. HIVE-16692.02.patch
          1 kB
          Siddharth Seth

        Activity

          People

            rajesh.balamohan Rajesh Balamohan
            rajesh.balamohan Rajesh Balamohan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: