Uploaded image for project: 'Apache Knox'
  1. Apache Knox
  2. KNOX-755

retry logic for replayBuffer limit errors is incorrect.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Abandoned
    • None
    • 1.6.0
    • None
    • None

    Description

      Hive receives corrupted thrift requests when using Knox with Hive with a large query and insufficient replayBuffer:

      org.apache.thrift.transport.TTransportException
      	at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
      	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
      	at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:354)
      	at org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:347)
      	at org.apache.hive.service.cli.thrift.TExecuteStatementReq$TExecuteStatementReqStandardScheme.read(TExecuteStatementReq.java:618)
      ...
      

      It seems that the retry logic for this error is incorrect, as follows (names changed to generic):

      2016-10-05 15:25:51,104 DEBUG http.wire (Wire.java:wire(63)) - >> "[0x80][0x1][0x0][0x1][0x0][0x0][0x0][0x10]ExecuteStatement[0x0][0x0][0x0]...![0x88]SELECT 1 AS `number_of_records`,[\n]"
      ...
      2016-10-05 15:25:51,117 DEBUG http.wire (Wire.java:wire(77)) - >> "  `tablename`.`columnn"
      2016-10-05 15:25:51,118 DEBUG http.wire (Wire.java:wire(63)) - >> "[\r][\n]"
      ...
      2016-10-05 15:25:51,119 INFO  client.DefaultHttpClient (DefaultRequestDirector.java:tryExecute(726)) - I/O exception (java.io.IOException) caught when processing request: Hit replay buffer max limit
      2016-10-05 15:25:51,120 DEBUG client.DefaultHttpClient (DefaultRequestDirector.java:tryExecute(731)) - Hit replay buffer max limit
      java.io.IOException: Hit replay buffer max limit
      	at org.apache.hadoop.gateway.dispatch.CappedBufferHttpEntity$ReplayStream.read(CappedBufferHttpEntity.java:143)
      	at java.io.InputStream.read(InputStream.java:101)
      	at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792)
      	at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)
      	at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)
      	at org.apache.hadoop.gateway.dispatch.CappedBufferHttpEntity.writeTo(CappedBufferHttpEntity.java:93)
      	at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:98)
      

      However, then it retries:

      2016-10-05 15:25:51,121 INFO  client.DefaultHttpClient (DefaultRequestDirector.java:tryExecute(733)) - Retrying request
      2016-10-05 15:25:51,121 DEBUG client.DefaultHttpClient (DefaultRequestDirector.java:tryExecute(703)) - Reopening the direct connection.
      

      After auth (for which the same incorrect request as below is sent, but not parsed due to 401), it sends the thing again with correct auth header, as follows:

      2016-10-05 15:25:51,166 DEBUG client.DefaultHttpClient (DefaultRequestDirector.java:tryExecute(713)) - Attempt 3 to execute request
      2016-10-05 15:25:51,166 DEBUG conn.DefaultClientConnection (DefaultClientConnection.java:sendRequestHeader(269)) - Sending request: POST /cliservice?doAs=... HTTP/1.1
      2016-10-05 15:25:51,167 DEBUG http.wire (Wire.java:wire(63)) - >> "POST /cliservice?doAs=... HTTP/1.1[\r][\n]"
      ...
      2016-10-05 15:25:51,169 DEBUG http.wire (Wire.java:wire(63)) - >> "Authorization: Negotiate ...
      2016-10-05 15:25:51,170 DEBUG http.wire (Wire.java:wire(63)) - >> "[\r][\n]"
      ...
      2016-10-05 15:25:51,172 DEBUG http.wire (Wire.java:wire(63)) - >> "1000[\r][\n]"
      2016-10-05 15:25:51,173 DEBUG http.wire (Wire.java:wire(63)) - >> "[0x80][0x1][0x0][0x1][0x0][0x0][0x0][0x10]ExecuteStatement[0x0] ... ![0x88]SELECT 1 AS `number_of_records`,[\n]"
      ...
      2016-10-05 15:25:51,186 DEBUG http.wire (Wire.java:wire(77)) - >> "  `tablename`.`columnn"
      2016-10-05 15:25:51,187 DEBUG http.wire (Wire.java:wire(63)) - >> "[\r][\n]"
      2016-10-05 15:25:51,187 DEBUG http.wire (Wire.java:wire(63)) - >> "1f3[\r][\n]"
      2016-10-05 15:25:51,187 DEBUG http.wire (Wire.java:wire(63)) - >> "ther` AS `anothercolumnnameother`,[\n]"
      ... rest of the query
      

      Note that there's a gap at "columnn", where "columnname" should be.

      This results in the above error when reading the request, and error 500 on gateway side.

      I think the retry logic should be fixed to send the correct buffer, or removed for this type of error.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sershe Sergey Shelukhin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: