Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4981

COMPUTE STATS with MT_DOP=1 and tight memory limit produces spilling error

    XMLWordPrintableJSON

Details

    Description

      After IMPALA-4467 and IMPALA-4882, stress test "training" tries to find the minimum memory limit needed to perform COMPUTE STATS statements, each with a variety of MT_DOP settings.

      The stress test "training" is failing during this phase, with the following:

      Spilling has been disabled for plans that do not have stats and are not hinted to prevent potentially bad plans from using too many cluster resources. Please run COMPUTE STATS on these tables, hint the plan or disable this behavior via the DISABLE_UNSAFE_SPILLS query option.
      

      In this case, the failure was from this sequence:

      USE tpcds_300_decimal_parquet;
      SET ABORT_ON_ERROR=1;
      SET MT_DOP=1;
      SET MEM_LIMIT=93M;
      COMPUTE STATS catalog_returns;
      

      This was near the end of the MT_DOP=1 COMPUTE STATS catalog_returns training, in which concurrent_select.py performs a MEM_LIMIT-wise binary search to find the minimum memory limit needed to run COMPUTE STATS (for the given MT_DOP). Logs show the following memory limits applied first:

      SET MEM_LIMIT=77308M
      SET MEM_LIMIT=38654M
      SET MEM_LIMIT=19327M
      SET MEM_LIMIT=9663M
      SET MEM_LIMIT=4831M
      SET MEM_LIMIT=2415M
      SET MEM_LIMIT=1207M
      SET MEM_LIMIT=603M
      SET MEM_LIMIT=301M
      SET MEM_LIMIT=150M <------ all successful completions through here
      SET MEM_LIMIT=75M <------ memory limit exceeded, which is fine
      SET MEM_LIMIT=112M <------ successful completion
      SET MEM_LIMIT=93M <------- error for this bug as described above
      

      Without MT_DOP, but with the limit in place, I get the error I'd expect, but then I apply MT_DOP, and I hit the error in this bug.

      USE tpcds_300_decimal_parquet;
      SET MEM_LIMIT=93M;
      COMPUTE STATS catalog_returns;
      WARNINGS:
      Memory limit exceeded
      Cannot perform aggregation at node with id 1. Failed to initialize hash table in preaggregation. The memory limit is too low to execute the query.
      
      
      SET MT_DOP=1;
      COMPUTE STATS catalog_returns;
      WARNINGS:
      Spilling has been disabled for plans that do not have stats and are not hinted to prevent potentially bad plans from using too many cluster resources. Please run COMPUTE STATS on these tables, hint the plan or disable this behavior via the DISABLE_UNSAFE_SPILLS query option.
      

      This doesn't happen unconditionally with MT_DOP or even MT_DOP=1. This happened after all the training completed for:

      tables: (call_center, catalog_page) X mt_dop: (1,2,4,8,16)
      

      Unfortunately this seems somewhat non-deterministic as to which table this could happen on: An earlier training attempt for MT_DOP=1 COMPUTE STATS catalog_returns succeeded. I checked the logs, and the exact same memory limits were applied. In the 93M attempt, the error returns was the typical "memory limit exceeded".

      However, a different COMPUTE STATS on a table failed, in that case, it was:

      USE tpcds_300_decimal_parquet;
      SET MT_DOP=1;
      SET ABORT_ON_ERROR=1;
      SET MEM_LIMIT=75M;
      COMPUTE STATS store_returns;
      

      Attachments

        Issue Links

          Activity

            People

              alex.behm Alexander Behm
              mikeb Michael Brown
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: