Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4401

test_client_ssl may hang during cancellation phase

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Cannot Reproduce
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Product Backlog
    • Component/s: Security
    • Labels:
      None

      Description

      I've seen this occasionally, and can reproduce if I run test_client_ssl for hours (took about 12 to trigger last time).

      test_client_ssl does this (amongst other things):

      LOG.info("Cancelling query")
          num_tries = 0
          # In practice, sending SIGINT to the shell process doesn't always seem to get caught
          # (and a search shows up some bugs in Python where SIGINT might be ignored). So retry
          # for 30s until one signal takes.
          while impalad.get_num_in_flight_queries() == 1:
            time.sleep(1)
            LOG.info("Sending signal...")
            os.kill(p.pid(), signal.SIGINT)
            num_tries += 1
            assert num_tries < 30, "SIGINT was not caught by shell within 30s"
      
          p.send_cmd("profile")
          result = p.get_result()
      

      The logs show that it hangs somewhere after calling get_num_in_flight_queries():

      MainThread: Cancelling query
      MainThread: Getting num_in_flight_queries from hnr-optiplex:25000
      MainThread: Sending signal...
      MainThread: Getting num_in_flight_queries from hnr-optiplex:25000
      MainThread: Sending signal...
      MainThread: Getting num_in_flight_queries from hnr-optiplex:25000
      MainThread: Sending signal...
      MainThread: Getting num_in_flight_queries from hnr-optiplex:25000
      MainThread: Sending signal...
      MainThread: Getting num_in_flight_queries from hnr-optiplex:25000
      MainThread: Sending signal...
      MainThread: Getting num_in_flight_queries from hnr-optiplex:25000
      MainThread: Sending signal...
      MainThread: Getting num_in_flight_queries from hnr-optiplex:25000
      <EOF>
      

      Finally, on the Impala side, the ssl cnxn seems to be waiting on the client:

      Thread 175 (Thread 0x7fcf8d2bd700 (LWP 17007)):
      #0  0x00007fcfc01683bd in read () at ../sysdeps/unix/syscall-template.S:81
      #1  0x00007fcfc272a14b in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
      #2  0x00007fcfc272816b in BIO_read () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
      #3  0x00007fcfc2a4d75b in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.0.0
      #4  0x00007fcfc2a4bfe2 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.0.0
      #5  0x00007fcfc2a4c604 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.0.0
      #6  0x000000000279c5c5 in apache::thrift::transport::TSSLSocket::checkHandshake() ()
      #7  0x000000000279c7ac in apache::thrift::transport::TSSLSocket::read(unsigned char*, unsigned int) ()
      #8  0x000000000279fae7 in apache::thrift::transport::TBufferedTransport::readSlow(unsigned char*, unsigned int) ()
      #9  0x0000000001164543 in apache::thrift::transport::TBufferBase::read (this=0xa58a050, buf=0x7fcf8d2bc640 "`\306+\215\317\177", len=4) at /data/henry/src/cloudera/impala-toolchain/thrift-0.9.0-p8/include/thrift/transport/TBufferTransports.h:69
      #10 0x0000000001168cc4 in apache::thrift::transport::readAll<apache::thrift::transport::TBufferBase> (trans=..., buf=0x7fcf8d2bc640 "`\306+\215\317\177", len=4) at /data/henry/src/cloudera/impala-toolchain/thrift-0.9.0-p8/include/thrift/transport/TTransport.h:39
      #11 0x00000000011645c3 in apache::thrift::transport::TBufferBase::readAll (this=0xa58a050, buf=0x7fcf8d2bc640 "`\306+\215\317\177", len=4) at /data/henry/src/cloudera/impala-toolchain/thrift-0.9.0-p8/include/thrift/transport/TBufferTransports.h:82
      #12 0x00000000012e6023 in apache::thrift::transport::TBufferedTransport::readAll (this=0xa58a050, buf=0x7fcf8d2bc640 "`\306+\215\317\177", len=4) at /data/henry/src/cloudera/impala-toolchain/thrift-0.9.0-p8/include/thrift/transport/TBufferTransports.h:279
      #13 0x00000000012f46a1 in apache::thrift::transport::TVirtualTransport<apache::thrift::transport::TBufferedTransport, apache::thrift::transport::TBufferBase>::readAll_virt (this=0xa58a050, buf=0x7fcf8d2bc640 "`\306+\215\317\177", len=4)
          at /data/henry/src/cloudera/impala-toolchain/thrift-0.9.0-p8/include/thrift/transport/TVirtualTransport.h:99
      #14 0x00000000012fcfc5 in apache::thrift::transport::TTransport::readAll (this=0xa58a050, buf=0x7fcf8d2bc640 "`\306+\215\317\177", len=4) at /data/henry/src/cloudera/impala-toolchain/thrift-0.9.0-p8/include/thrift/transport/TTransport.h:126
      #15 0x0000000001304682 in apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport>::readI32 (this=0xaffe090, i32=@0x7fcf8d2bc68c: 32719) at /data/henry/src/cloudera/impala-toolchain/thrift-0.9.0-p8/include/thrift/protocol/TBinaryProtocol.tcc:375
      #16 0x0000000001303e71 in apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport>::readMessageBegin (this=0xaffe090, name=..., messageType=@0x7fcf8d2bc76c: 0, seqid=@0x7fcf8d2bc768: 18155099)
          at /data/henry/src/cloudera/impala-toolchain/thrift-0.9.0-p8/include/thrift/protocol/TBinaryProtocol.tcc:206
      #17 0x00000000013006fc in apache::thrift::protocol::TVirtualProtocol<apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport>, apache::thrift::protocol::TProtocolDefaults>::readMessageBegin_virt (this=0xaffe090, name=...,
          messageType=@0x7fcf8d2bc76c: 0, seqid=@0x7fcf8d2bc768: 18155099) at /data/henry/src/cloudera/impala-toolchain/thrift-0.9.0-p8/include/thrift/protocol/TVirtualProtocol.h:432
      #18 0x00000000011508dc in apache::thrift::protocol::TProtocol::readMessageBegin (this=0xaffe090, name=..., messageType=@0x7fcf8d2bc76c: 0, seqid=@0x7fcf8d2bc768: 18155099)
          at /data/henry/src/cloudera/impala-toolchain/thrift-0.9.0-p8/include/thrift/protocol/TProtocol.h:529
      

      So my best guess is that the SIGINT the test sends to the shell process arrives at a bad moment, causing the SSL handshake to abort. Impala itself is functioning just fine, and thinks there are two open Beeswax connections.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sailesh Sailesh Mukil
                Reporter:
                henryr Henry Robinson
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: