Details

      Description

      Saw this failure on a Jenkins job for both test_failure_in_prepare and test_failure_in_prepare_multi_fragment.

      Tim - assigning this to you as it may be related to IMPALA-4087

      11:09:56 =================================== FAILURES ===================================
      11:09:56 ________________ TestFragmentLifecycle.test_failure_in_prepare _________________
      11:09:56 query_test/test_lifecycle.py:45: in test_failure_in_prepare
      11:09:56     v.wait_for_metric(self.IN_FLIGHT_FRAGMENTS, 0)
      11:09:56 verifiers/metric_verifier.py:59: in wait_for_metric
      11:09:56     self.impalad_service.wait_for_metric_value(metric_name, expected_value, timeout)
      11:09:56 common/impala_service.py:85: in wait_for_metric_value
      11:09:56     (metric_name, expected_value, timeout)
      11:09:56 E   AssertionError: Metric value impala-server.num-fragments-in-flight did not reach value 0 in 60s
      11:09:56 ---------------------------- Captured stderr setup -----------------------------
      11:09:56 -- connecting to: localhost:21000
      11:09:56 ----------------------------- Captured stderr call -----------------------------
      11:09:56 MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
      11:09:56 -- executing against localhost:21000
      11:09:56 SET DEBUG_ACTION='-1:0:PREPARE:FAIL';
      11:09:56 
      11:09:56 -- executing against localhost:21000
      11:09:56 SELECT COUNT(*) FROM functional.alltypes;
      11:09:56 
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 3
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 3
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      11:09:56 MainThread: Sleeping 1s before next retry.
      11:09:56 MainThread: Getting metric: impala-server.num-fragments-in-flight from impala-boost-static-burst-slave-14d9.vpc.cloudera.com:25000
      11:09:56 MainThread: Waiting for metric value 'impala-server.num-fragments-in-flight'=0. Current value: 1
      ...

        Activity

        Hide
        tarmstrong Tim Armstrong added a comment -

        What was the last commit on the branch?

        Show
        tarmstrong Tim Armstrong added a comment - What was the last commit on the branch?
        Hide
        twmarshall Thomas Tauber-Marshall added a comment -

        0ed8efa4653a592aba52359d7fb1f26010a7ab57
        which does include the fix for IMPALA-4087, so it may be unrelated.

        Show
        twmarshall Thomas Tauber-Marshall added a comment - 0ed8efa4653a592aba52359d7fb1f26010a7ab57 which does include the fix for IMPALA-4087 , so it may be unrelated.
        Hide
        tarmstrong Tim Armstrong added a comment -

        I don't think this is related to IMPALA-4087, which was a test bug - im IMPALA 4087 it was waiting for num-fragments to reach 1 (which didn't make sense). Here it's waiting for the correct value but it's stuck at 1 - it's probably some kind of query lifecycle bug.

        Show
        tarmstrong Tim Armstrong added a comment - I don't think this is related to IMPALA-4087 , which was a test bug - im IMPALA 4087 it was waiting for num-fragments to reach 1 (which didn't make sense). Here it's waiting for the correct value but it's stuck at 1 - it's probably some kind of query lifecycle bug.
        Hide
        tarmstrong Tim Armstrong added a comment -

        Henry Robinson do you have any ideas about what might cause this?

        Show
        tarmstrong Tim Armstrong added a comment - Henry Robinson do you have any ideas about what might cause this?
        Hide
        henryr Henry Robinson added a comment -

        I've seen this happen when a fragment is blocked in TransmitData(). Because the timeout for sending data is larger than the time the test waits for the number of fragments to hit 0, the test can spuriously fail.

        https://issues.cloudera.org/browse/IMPALA-3977 will fix this once and for all. For now, we could increase the verification timeout to 2.5 minutes.

        (There might be something different going on here - this is just a common cause of this problem).

        Show
        henryr Henry Robinson added a comment - I've seen this happen when a fragment is blocked in TransmitData() . Because the timeout for sending data is larger than the time the test waits for the number of fragments to hit 0, the test can spuriously fail. https://issues.cloudera.org/browse/IMPALA-3977 will fix this once and for all. For now, we could increase the verification timeout to 2.5 minutes. (There might be something different going on here - this is just a common cause of this problem).
        Hide
        tarmstrong Tim Armstrong added a comment -

        The below patch should hopefully avoid the flakiness. We can reopen if it happens again.

        IMPALA-4519: increase timeout in TestFragmentLifecycle

        Increase the timeout to over 120s to match datastream_sender_timeout_ms.
        This should avoid spurious test failures if we are unlucky and a sender
        gets stuck waiting for a receiver fragment that will never start.

        Testing:
        Ran the test in a loop for a while to flush out any flakiness.

        Change-Id: I9fe6e6c74538d0747e3eeb578cf0518494ff10c8
        Reviewed-on: http://gerrit.cloudera.org:8080/5244
        Tested-by: Impala Public Jenkins
        Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>

        Show
        tarmstrong Tim Armstrong added a comment - The below patch should hopefully avoid the flakiness. We can reopen if it happens again. IMPALA-4519 : increase timeout in TestFragmentLifecycle Increase the timeout to over 120s to match datastream_sender_timeout_ms. This should avoid spurious test failures if we are unlucky and a sender gets stuck waiting for a receiver fragment that will never start. Testing: Ran the test in a loop for a while to flush out any flakiness. Change-Id: I9fe6e6c74538d0747e3eeb578cf0518494ff10c8 Reviewed-on: http://gerrit.cloudera.org:8080/5244 Tested-by: Impala Public Jenkins Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
        Hide
        mjacobs Matthew Jacobs added a comment -

        Tim Armstrong I've seen test_failure_in_prepare failing in gvo's today – it looks like the commit increases the timeout of test_failure_in_prepare_multi_fragment but not for test_failure_in_prepare which is still failing. Should that be bumped as well?

        Show
        mjacobs Matthew Jacobs added a comment - Tim Armstrong I've seen test_failure_in_prepare failing in gvo's today – it looks like the commit increases the timeout of test_failure_in_prepare_multi_fragment but not for test_failure_in_prepare which is still failing. Should that be bumped as well?
        Hide
        mjacobs Matthew Jacobs added a comment -

        Re-opening because I've seen this again today.

        Show
        mjacobs Matthew Jacobs added a comment - Re-opening because I've seen this again today.
        Hide
        mjacobs Matthew Jacobs added a comment -

        At first I thought this might be the same issue again, but I suspect it's a different issue with this test so I've opened a different JIRA: IMPALA-4642

        Show
        mjacobs Matthew Jacobs added a comment - At first I thought this might be the same issue again, but I suspect it's a different issue with this test so I've opened a different JIRA: IMPALA-4642
        Hide
        tarmstrong Tim Armstrong added a comment -

        We should really add a check at the top of the tests that there are no fragments running.

        Show
        tarmstrong Tim Armstrong added a comment - We should really add a check at the top of the tests that there are no fragments running.
        Hide
        mjacobs Matthew Jacobs added a comment -

        Yeah, I agree - or a check after the completion of a test/test class (which may be easier to tie then to a faulty test).

        Show
        mjacobs Matthew Jacobs added a comment - Yeah, I agree - or a check after the completion of a test/test class (which may be easier to tie then to a faulty test).
        Hide
        jbapple Jim Apple added a comment -

        This is a bulk comment on all issues with Fix Version 2.8.0 that were resolved on or after 2016-12-09.

        2.8.0 was branched on December 9, with only two changes to master cherry-picked to the 2.8.0 release branch after that:

        https://github.com/apache/incubator-impala/commits/2.8.0

        Issues fixed after December 9 might not be fixed in 2.8.0. If you are the one who marked this issue Resolved, can you check to see if the patch is in 2.8.0 by using the link above? If the patch is not in 2.8.0, can you change the Fix Version to 2.9.0?

        Thank you!

        Show
        jbapple Jim Apple added a comment - This is a bulk comment on all issues with Fix Version 2.8.0 that were resolved on or after 2016-12-09. 2.8.0 was branched on December 9, with only two changes to master cherry-picked to the 2.8.0 release branch after that: https://github.com/apache/incubator-impala/commits/2.8.0 Issues fixed after December 9 might not be fixed in 2.8.0. If you are the one who marked this issue Resolved, can you check to see if the patch is in 2.8.0 by using the link above? If the patch is not in 2.8.0, can you change the Fix Version to 2.9.0? Thank you!

          People

          • Assignee:
            tarmstrong Tim Armstrong
            Reporter:
            twmarshall Thomas Tauber-Marshall
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development