Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9202

Fix flakiness in TestExecutorGroups

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      test_executor_groups.TestExecutorGroups.test_admission_slots failed on an assertion recently with the following stacktrace

      custom_cluster/test_executor_groups.py:215: in test_admission_slots
          assert ("Initial admission queue reason: Not enough admission control slots "
      E   assert 'Initial admission queue reason: Not enough admission control slots available on host' in 'Query (id=0445eb75d4842dce:219df01e00000000):\n  DEBUG MODE WARNING: Query profile created while running a DEBUG buil...0)\n     - NumRowsFetchedFromCache: 0 (0)\n     - RowMaterializationRate: 0\n     - RowMaterializationTimer: 0.000ns\n
      

      On investigating the logs, it seems like the query did in fact get queued with the expected reason. The only reason I can think of that it failed to appear on profile is that the profile was fetched before the admission reason could be added to the profile. This happened in an ASAN build so I am assuming the slowness in execution contributed to widening the window in which this can happen.

      from the logs:

      I1104 18:18:34.144309 113361 impala-server.cc:1046] 0445eb75d4842dce:219df01e00000000] Registered query query_id=0445eb75d4842dce:219df01e00000000 session_id=da467385483f4fb3:16683a81d25fe79e
      I1104 18:18:34.144951 113361 Frontend.java:1256] 0445eb75d4842dce:219df01e00000000] Analyzing query: select * from functional_parquet.alltypestiny              where month < 3 and id + random() < sleep(500); db: default
      I1104 18:18:34.149049 113361 BaseAuthorizationChecker.java:96] 0445eb75d4842dce:219df01e00000000] Authorization check took 4 ms
      I1104 18:18:34.149219 113361 Frontend.java:1297] 0445eb75d4842dce:219df01e00000000] Analysis and authorization finished.
      I1104 18:18:34.163229 113885 scheduler.cc:548] 0445eb75d4842dce:219df01e00000000] Exec at coord is false
      I1104 18:18:34.163945 113885 admission-controller.cc:1295] 0445eb75d4842dce:219df01e00000000] Trying to admit id=0445eb75d4842dce:219df01e00000000 in pool_name=default-pool executor_group_name=default-pool-group1 per_host_mem_estimate=176.02 MB dedicated_coord_mem_estimate=100.02 MB max_requests=-1 (configured statically) max_queued=200 (configured statically) max_mem=-1.00 B (configured statically)
      I1104 18:18:34.164203 113885 admission-controller.cc:1307] 0445eb75d4842dce:219df01e00000000] Stats: agg_num_running=1, agg_num_queued=0, agg_mem_reserved=8.00 KB,  local_host(local_mem_admitted=452.05 MB, num_admitted_running=1, num_queued=0, backend_mem_reserved=8.00 KB)
      I1104 18:18:34.164383 113885 admission-controller.cc:902] 0445eb75d4842dce:219df01e00000000] Queuing, query id=0445eb75d4842dce:219df01e00000000 reason: Not enough admission control slots available on host impala-ec2-centos74-r4-4xlarge-ondemand-1f88.vpc.cloudera.com:22002. Needed 1 slots but 1/1 are already in use.
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bikramjeet.vig Bikramjeet Vig
            bikramjeet.vig Bikramjeet Vig
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment