Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5079

Flaky tests: Kudu EE tests need longer HS2 connection timeouts

    Details

    • Epic Color:
      ghx-label-2

      Description

      The following test started failing randomly.

      MainThread: Created database "test_kudu_col_null_changed_bc507455" for test ID "query_test/test_kudu.py::TestKuduOperations::()::test_kudu_col_null_changed"
      11:38:31 ----------------------------- Captured stderr call -----------------------------
      11:38:31 MainThread: Failed to open transport (tries_left=3)
      11:38:31 Traceback (most recent call last):
      11:38:31   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py", line 940, in _execute
      11:38:31     return func(request)
      11:38:31   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 265, in ExecuteStatement
      11:38:31     return self.recv_ExecuteStatement()
      11:38:31   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 276, in recv_ExecuteStatement
      11:38:31     (fname, mtype, rseqid) = self._iprot.readMessageBegin()
      11:38:31   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/thrift/protocol/TBinaryProtocol.py", line 126, in readMessageBegin
      11:38:31     sz = self.readI32()
      11:38:31   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/thrift/protocol/TBinaryProtocol.py", line 206, in readI32
      11:38:31     buff = self.trans.readAll(4)
      11:38:31   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/thrift/transport/TTransport.py", line 58, in readAll
      11:38:31     chunk = self.read(sz - have)
      11:38:31   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/thrift/transport/TTransport.py", line 159, in read
      11:38:31     self.__rbuf = StringIO(self.__trans.read(max(sz, self.__rbuf_size)))
      11:38:31   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/thrift/transport/TSocket.py", line 103, in read
      11:38:31     buff = self.handle.recv(sz)
      11:38:31 timeout: timed out
      

      Not clear what is going on from the error message or the logs. MJ do you mind taking a quick look? Feel free to reassign as you see fit.

        Activity

        Hide
        mjacobs Matthew Jacobs added a comment -

        probably related https://issues.apache.org/jira/browse/IMPALA-4454
        saw a similar timeout there, and we bumped the timeouts from 45 to several minutes. We should probably do the same thing for now, and look more closely at the pytest fixtures being used as they do look suspicious. the 'conn' fixture is per-class so maybe bad state between test cases. It's also pretty weird in that it takes several parameters special functions that can be defined on the test class.

        Show
        mjacobs Matthew Jacobs added a comment - probably related https://issues.apache.org/jira/browse/IMPALA-4454 saw a similar timeout there, and we bumped the timeouts from 45 to several minutes. We should probably do the same thing for now, and look more closely at the pytest fixtures being used as they do look suspicious. the 'conn' fixture is per-class so maybe bad state between test cases. It's also pretty weird in that it takes several parameters special functions that can be defined on the test class.
        Hide
        mjacobs Matthew Jacobs added a comment -

        commit 7e25ec8ec03e0594a630910b244686ad12d264f8
        Author: Matthew Jacobs <mj@cloudera.com>
        Date: Fri Mar 17 18:45:32 2017 -0700

        IMPALA-5079: Bump timeout for TestKuduOperations

        Tests seem to be timing out similarly to IMPALA-4454. Bump
        the timeout to unblock tests for now.

        TODO: untangle the use of all the fancy pytest fixtures, they
        may be leaking state between test cases

        Change-Id: I5fd8e5eb5dbd977c386e6557941a2c47de44f344
        Reviewed-on: http://gerrit.cloudera.org:8080/6453
        Reviewed-by: Matthew Jacobs <mj@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        mjacobs Matthew Jacobs added a comment - commit 7e25ec8ec03e0594a630910b244686ad12d264f8 Author: Matthew Jacobs <mj@cloudera.com> Date: Fri Mar 17 18:45:32 2017 -0700 IMPALA-5079 : Bump timeout for TestKuduOperations Tests seem to be timing out similarly to IMPALA-4454 . Bump the timeout to unblock tests for now. TODO: untangle the use of all the fancy pytest fixtures, they may be leaking state between test cases Change-Id: I5fd8e5eb5dbd977c386e6557941a2c47de44f344 Reviewed-on: http://gerrit.cloudera.org:8080/6453 Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Impala Public Jenkins
        Hide
        mjacobs Matthew Jacobs added a comment -

        re-opening because I'm seeing the same issue in other tests:

        20:18:43 =================================== FAILURES ===================================
        20:18:43 ______________ TestCreateExternalTable.test_implicit_table_props _______________
        20:18:43 [gw2] linux2 -- Python 2.6.6 /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/bin/../infra/python/env/bin/python
        20:18:43 query_test/test_kudu.py:357: in test_implicit_table_props
        20:18:43     assert ["", "storage_handler", "com.cloudera.kudu.hive.KuduStorageHandler"] \
        20:18:43 /usr/lib64/python2.6/contextlib.py:34: in __exit__
        20:18:43     self.gen.throw(type, value, traceback)
        20:18:43 common/kudu_test_suite.py:145: in drop_impala_table_after_context
        20:18:43     cursor.execute("DROP TABLE %s" % table_name)
        20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:302: in execute
        20:18:43     configuration=configuration)
        20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:343: in execute_async
        20:18:43     self._execute_async(op)
        20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:362: in _execute_async
        20:18:43     operation_fn()
        20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:340: in op
        20:18:43     async=True)
        20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:1027: in execute
        20:18:43     return self._operation('ExecuteStatement', req)
        20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:957: in _operation
        20:18:43     resp = self._rpc(kind, request)
        20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:925: in _rpc
        20:18:43     err_if_rpc_not_ok(response)
        20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:704: in err_if_rpc_not_ok
        20:18:43     raise HiveServer2Error(resp.status.errorMessage)
        20:18:43 E   HiveServer2Error: Invalid session id
        20:18:43 ---------------------------- Captured stderr setup -----------------------------
        20:18:43 -- connecting to: localhost:21000
        20:18:43 MainThread: Closing active operation
        20:18:43 MainThread: Using database unh2pu as default
        20:18:43 ----------------------------- Captured stderr call -----------------------------
        20:18:43 MainThread: Failed to open transport (tries_left=3)
        20:18:43 Traceback (most recent call last):
        20:18:43   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py", line 940, in _execute
        20:18:43     return func(request)
        20:18:43   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 265, in ExecuteStatement
        20:18:43     return self.recv_ExecuteStatement()
        20:18:43   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 276, in recv_ExecuteStatement
        20:18:43     (fname, mtype, rseqid) = self._iprot.readMessageBegin()
        20:18:43   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/thrift/protocol/TBinaryProtocol.py", line 126, in readMessageBegin
        20:18:43     sz = self.readI32()
        20:18:43   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/thrift/protocol/TBinaryProtocol.py", line 206, in readI32
        20:18:43     buff = self.trans.readAll(4)
        20:18:43   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/thrift/transport/TTransport.py", line 58, in readAll
        20:18:43     chunk = self.read(sz - have)
        20:18:43   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/thrift/transport/TTransport.py", line 159, in read
        20:18:43     self.__rbuf = StringIO(self.__trans.read(max(sz, self.__rbuf_size)))
        20:18:43   File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/thrift/transport/TSocket.py", line 103, in read
        20:18:43     buff = self.handle.recv(sz)
        20:18:43 timeout: timed out
        20:18:43 ======= 1 failed, 1389 passed, 35 skipped, 42 xfailed in 4102.17 seconds =======
        
        Show
        mjacobs Matthew Jacobs added a comment - re-opening because I'm seeing the same issue in other tests: 20:18:43 =================================== FAILURES =================================== 20:18:43 ______________ TestCreateExternalTable.test_implicit_table_props _______________ 20:18:43 [gw2] linux2 -- Python 2.6.6 /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/bin/../infra/python/env/bin/python 20:18:43 query_test/test_kudu.py:357: in test_implicit_table_props 20:18:43 assert [ "", " storage_handler ", " com.cloudera.kudu.hive.KuduStorageHandler"] \ 20:18:43 /usr/lib64/python2.6/contextlib.py:34: in __exit__ 20:18:43 self.gen. throw (type, value, traceback) 20:18:43 common/kudu_test_suite.py:145: in drop_impala_table_after_context 20:18:43 cursor.execute( "DROP TABLE %s" % table_name) 20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:302: in execute 20:18:43 configuration=configuration) 20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:343: in execute_async 20:18:43 self._execute_async(op) 20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:362: in _execute_async 20:18:43 operation_fn() 20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:340: in op 20:18:43 async=True) 20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:1027: in execute 20:18:43 return self._operation('ExecuteStatement', req) 20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:957: in _operation 20:18:43 resp = self._rpc(kind, request) 20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:925: in _rpc 20:18:43 err_if_rpc_not_ok(response) 20:18:43 ../infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py:704: in err_if_rpc_not_ok 20:18:43 raise HiveServer2Error(resp.status.errorMessage) 20:18:43 E HiveServer2Error: Invalid session id 20:18:43 ---------------------------- Captured stderr setup ----------------------------- 20:18:43 -- connecting to: localhost:21000 20:18:43 MainThread: Closing active operation 20:18:43 MainThread: Using database unh2pu as default 20:18:43 ----------------------------- Captured stderr call ----------------------------- 20:18:43 MainThread: Failed to open transport (tries_left=3) 20:18:43 Traceback (most recent call last): 20:18:43 File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/impala/hiveserver2.py" , line 940, in _execute 20:18:43 return func(request) 20:18:43 File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py" , line 265, in ExecuteStatement 20:18:43 return self.recv_ExecuteStatement() 20:18:43 File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py" , line 276, in recv_ExecuteStatement 20:18:43 (fname, mtype, rseqid) = self._iprot.readMessageBegin() 20:18:43 File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/thrift/protocol/TBinaryProtocol.py" , line 126, in readMessageBegin 20:18:43 sz = self.readI32() 20:18:43 File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/thrift/protocol/TBinaryProtocol.py" , line 206, in readI32 20:18:43 buff = self.trans.readAll(4) 20:18:43 File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/thrift/transport/TTransport.py" , line 58, in readAll 20:18:43 chunk = self.read(sz - have) 20:18:43 File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/thrift/transport/TTransport.py" , line 159, in read 20:18:43 self.__rbuf = StringIO(self.__trans.read(max(sz, self.__rbuf_size))) 20:18:43 File "/data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/infra/python/env/lib/python2.6/site-packages/thrift/transport/TSocket.py" , line 103, in read 20:18:43 buff = self.handle.recv(sz) 20:18:43 timeout: timed out 20:18:43 ======= 1 failed, 1389 passed, 35 skipped, 42 xfailed in 4102.17 seconds =======
        Hide
        mjacobs Matthew Jacobs added a comment -

        commit 532c5f26052070cccf90eac83de9ba0052879729
        Author: Matthew Jacobs <mj@cloudera.com>
        Date: Wed Apr 12 16:42:33 2017 -0700

        IMPALA-5079: Flaky Kudu tests; fix HS2 connection timeouts

        Fixes the HS2 timeouts for all Kudu EE tests. Previously
        only 2 classes had the timeout set, but all the Kudu tests
        appear to be susceptible to this issue.

        Change-Id: Ibc48b4b7ae65ddf4bba087d079d4e4032f4d5f0f
        Reviewed-on: http://gerrit.cloudera.org:8080/6616
        Reviewed-by: Michael Brown <mikeb@cloudera.com>
        Reviewed-by: Alex Behm <alex.behm@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        mjacobs Matthew Jacobs added a comment - commit 532c5f26052070cccf90eac83de9ba0052879729 Author: Matthew Jacobs <mj@cloudera.com> Date: Wed Apr 12 16:42:33 2017 -0700 IMPALA-5079 : Flaky Kudu tests; fix HS2 connection timeouts Fixes the HS2 timeouts for all Kudu EE tests. Previously only 2 classes had the timeout set, but all the Kudu tests appear to be susceptible to this issue. Change-Id: Ibc48b4b7ae65ddf4bba087d079d4e4032f4d5f0f Reviewed-on: http://gerrit.cloudera.org:8080/6616 Reviewed-by: Michael Brown <mikeb@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            mjacobs Matthew Jacobs
            Reporter:
            dtsirogiannis Dimitris Tsirogiannis
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development