Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4914

TestSpillStress makes flawed assumptions about running concurrently

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Infrastructure
    • Labels:
      None

      Description

      I took a look at TestSpillStress and found a bunch of problems with it.

      1. It's not being run, because its workload isn't being run exhaustively
      2. It can't run, because it doesn't set up a client properly.
      3. It looks like it was intended to run in parallel fashion, but custom cluster tests don't run in parallel.

      The evidence for my claim of #3 is due to the fact that it uses the agg_stress workload, which says:

      # This query forces many joins and aggregations with spilling
      # and can expose race conditions in the spilling code if run in parallel
      

      1 and 2 could be fixed very quickly and were fixed at https://gerrit.cloudera.org/#/c/6002/

      3 is a different thing and requires some thought. We don't have any mechanism to run custom cluster tests in any sort of parallel way. All custom cluster tests are serial, even though they lack the serial mark. The reason this is the case is because most custom cluster tests involve restarting impalad on every method. We can't have methods run in parallel if they will be restarting on top of each other. As such, custom cluster tests are invoked differently than other e2e tests, and the invocation does not include -n, which causes tests to run in parallel.

        Activity

        Hide
        mikesbrown Michael Brown added a comment -

        https://gerrit.cloudera.org/#/c/6002/ will fix problems 1 and 2 above, but not problem 3.

        Show
        mikesbrown Michael Brown added a comment - https://gerrit.cloudera.org/#/c/6002/ will fix problems 1 and 2 above, but not problem 3.
        Hide
        mikesbrown Michael Brown added a comment -

        The commit here:

        commit 9414d53a891448be13228f5fc63e089522e143aa
        Author: Michael Brown <mikeb@cloudera.com>
        Date:   Fri Feb 10 09:35:16 2017 -0800
        
            IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests
        
            This patch allows any test suites whose workload is "targeted-stress" to
            be run in so-called "exhaustive" mode. Before this patch, only suites
            whose workload was "functional-query" would be run exhaustively. More on
            this flaw is in IMPALA-3947.
        
            The net effects are:
        
            1. We fix IMPALA-4904, which allows test_ddl_stress to start running
               again.
            2. We also improve the situation in IMPALA-4914 by getting
               TestSpillStress to run, though we don't fix its
               not-running-concurrently problem.
        
            The other mini-cluster stress tests were disabled in this commit:
              IMPALA-2605: Omit the sort and mini stress tests
            so they are not directly affected here.
        
            I also tried to clarify what "exhaustive" means in some of our shell
            scripts, via help text and comments.
        
            An exhaustive build+test run showed test_ddl_stress and TestSpillStress
            now get run and passed. This adds roughly 12 minutes to a build that's
            on the order of 13-14 hours.
        
            Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec
            Reviewed-on: http://gerrit.cloudera.org:8080/6002
            Tested-by: Impala Public Jenkins
            Reviewed-by: Jim Apple <jbapple-impala@apache.org>
        

        fixes problems 1 and 2. I am reducing the priority and changing the summary to deal with 3.

        Show
        mikesbrown Michael Brown added a comment - The commit here: commit 9414d53a891448be13228f5fc63e089522e143aa Author: Michael Brown <mikeb@cloudera.com> Date: Fri Feb 10 09:35:16 2017 -0800 IMPALA-4904,IMPALA-4914: add targeted-stress to exhaustive tests This patch allows any test suites whose workload is "targeted-stress" to be run in so-called "exhaustive" mode. Before this patch, only suites whose workload was "functional-query" would be run exhaustively. More on this flaw is in IMPALA-3947. The net effects are: 1. We fix IMPALA-4904, which allows test_ddl_stress to start running again. 2. We also improve the situation in IMPALA-4914 by getting TestSpillStress to run, though we don't fix its not-running-concurrently problem. The other mini-cluster stress tests were disabled in this commit: IMPALA-2605: Omit the sort and mini stress tests so they are not directly affected here. I also tried to clarify what "exhaustive" means in some of our shell scripts, via help text and comments. An exhaustive build+test run showed test_ddl_stress and TestSpillStress now get run and passed. This adds roughly 12 minutes to a build that's on the order of 13-14 hours. Change-Id: Ie6bd5bbd380e636d680368e908519b042d79dfec Reviewed-on: http://gerrit.cloudera.org:8080/6002 Tested-by: Impala Public Jenkins Reviewed-by: Jim Apple <jbapple-impala@apache.org> fixes problems 1 and 2. I am reducing the priority and changing the summary to deal with 3.
        Hide
        tarmstrong Tim Armstrong added a comment -

        IMPALA-4914,IMPALA-4999: remove flaky TestSpillStress

        The test does not work as intended and would likely not provide
        very good coverage of the different spilling paths, because it
        only runs a single simple query. The stress test
        (tests/stress/concurrent_select.py) provides much better coverage.
        test_mem_usage_scaling.py probably also provides better coverage
        that TestSpillStress in its current form.

        Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71
        Reviewed-on: http://gerrit.cloudera.org:8080/6437
        Reviewed-by: Michael Brown <mikeb@cloudera.com>
        Tested-by: Impala Public Jenkins

        Show
        tarmstrong Tim Armstrong added a comment - IMPALA-4914 , IMPALA-4999 : remove flaky TestSpillStress The test does not work as intended and would likely not provide very good coverage of the different spilling paths, because it only runs a single simple query. The stress test (tests/stress/concurrent_select.py) provides much better coverage. test_mem_usage_scaling.py probably also provides better coverage that TestSpillStress in its current form. Change-Id: Ie792d64dc88f682066c13e559918d72d76b31b71 Reviewed-on: http://gerrit.cloudera.org:8080/6437 Reviewed-by: Michael Brown <mikeb@cloudera.com> Tested-by: Impala Public Jenkins —

          People

          • Assignee:
            tarmstrong Tim Armstrong
            Reporter:
            mikesbrown Michael Brown
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development