Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4981

COMPUTE STATS with MT_DOP=1 and tight memory limit produces spilling error

    Details

      Description

      After IMPALA-4467 and IMPALA-4882, stress test "training" tries to find the minimum memory limit needed to perform COMPUTE STATS statements, each with a variety of MT_DOP settings.

      The stress test "training" is failing during this phase, with the following:

      Spilling has been disabled for plans that do not have stats and are not hinted to prevent potentially bad plans from using too many cluster resources. Please run COMPUTE STATS on these tables, hint the plan or disable this behavior via the DISABLE_UNSAFE_SPILLS query option.
      

      In this case, the failure was from this sequence:

      USE tpcds_300_decimal_parquet;
      SET ABORT_ON_ERROR=1;
      SET MT_DOP=1;
      SET MEM_LIMIT=93M;
      COMPUTE STATS catalog_returns;
      

      This was near the end of the MT_DOP=1 COMPUTE STATS catalog_returns training, in which concurrent_select.py performs a MEM_LIMIT-wise binary search to find the minimum memory limit needed to run COMPUTE STATS (for the given MT_DOP). Logs show the following memory limits applied first:

      SET MEM_LIMIT=77308M
      SET MEM_LIMIT=38654M
      SET MEM_LIMIT=19327M
      SET MEM_LIMIT=9663M
      SET MEM_LIMIT=4831M
      SET MEM_LIMIT=2415M
      SET MEM_LIMIT=1207M
      SET MEM_LIMIT=603M
      SET MEM_LIMIT=301M
      SET MEM_LIMIT=150M <------ all successful completions through here
      SET MEM_LIMIT=75M <------ memory limit exceeded, which is fine
      SET MEM_LIMIT=112M <------ successful completion
      SET MEM_LIMIT=93M <------- error for this bug as described above
      

      Without MT_DOP, but with the limit in place, I get the error I'd expect, but then I apply MT_DOP, and I hit the error in this bug.

      USE tpcds_300_decimal_parquet;
      SET MEM_LIMIT=93M;
      COMPUTE STATS catalog_returns;
      WARNINGS:
      Memory limit exceeded
      Cannot perform aggregation at node with id 1. Failed to initialize hash table in preaggregation. The memory limit is too low to execute the query.
      
      
      SET MT_DOP=1;
      COMPUTE STATS catalog_returns;
      WARNINGS:
      Spilling has been disabled for plans that do not have stats and are not hinted to prevent potentially bad plans from using too many cluster resources. Please run COMPUTE STATS on these tables, hint the plan or disable this behavior via the DISABLE_UNSAFE_SPILLS query option.
      

      This doesn't happen unconditionally with MT_DOP or even MT_DOP=1. This happened after all the training completed for:

      tables: (call_center, catalog_page) X mt_dop: (1,2,4,8,16)
      

      Unfortunately this seems somewhat non-deterministic as to which table this could happen on: An earlier training attempt for MT_DOP=1 COMPUTE STATS catalog_returns succeeded. I checked the logs, and the exact same memory limits were applied. In the 93M attempt, the error returns was the typical "memory limit exceeded".

      However, a different COMPUTE STATS on a table failed, in that case, it was:

      USE tpcds_300_decimal_parquet;
      SET MT_DOP=1;
      SET ABORT_ON_ERROR=1;
      SET MEM_LIMIT=75M;
      COMPUTE STATS store_returns;
      

        Issue Links

          Activity

          Hide
          mikesbrown Michael Brown added a comment -

          A note on the priority: I decided to set this to P1, because it blocks the stress test from running and is thus a broken build.

          We could demote this out of P1 if we decide we don't care about COMPUTE STATS for the stress test.

          Show
          mikesbrown Michael Brown added a comment - A note on the priority: I decided to set this to P1, because it blocks the stress test from running and is thus a broken build. We could demote this out of P1 if we decide we don't care about COMPUTE STATS for the stress test.
          Hide
          mikesbrown Michael Brown added a comment -

          We could also demote this out of P1 if someone makes a valid case that the stress test should treat the "Spilling has been disabled for plans..." message just like the "Memory limit exceeded..." message when it's doing the binary search to find a minimum memory limit.

          Show
          mikesbrown Michael Brown added a comment - We could also demote this out of P1 if someone makes a valid case that the stress test should treat the "Spilling has been disabled for plans..." message just like the "Memory limit exceeded..." message when it's doing the binary search to find a minimum memory limit.
          Hide
          alex.behm Alexander Behm added a comment -

          Thanks for filing, Michael. Let me investigate.

          Show
          alex.behm Alexander Behm added a comment - Thanks for filing, Michael. Let me investigate.
          Hide
          alex.behm Alexander Behm added a comment -

          Michael Brown, turns out that spilling is disabled when MT_DOP is set. See Frontend#createExecRequest():

              // Optionally disable spilling in the backend. Allow spilling if there are plan hints
              // or if all tables have stats.
              boolean disableSpilling =
                  queryCtx.client_request.query_options.isDisable_unsafe_spills()
                    && !queryCtx.tables_missing_stats.isEmpty()
                    && !analysisResult.getAnalyzer().hasPlanHints();
              // for now, always disable spilling for multi-threaded execution
              if (isMtExec || disableSpilling) queryCtx.setDisable_spilling(true);
          

          Disabling spilling with MT_DOP doesn't really make sense anymore because we only execute simple queries with MT_DOP. The fix is to remove that one line.

          Show
          alex.behm Alexander Behm added a comment - Michael Brown , turns out that spilling is disabled when MT_DOP is set. See Frontend#createExecRequest(): // Optionally disable spilling in the backend. Allow spilling if there are plan hints // or if all tables have stats. boolean disableSpilling = queryCtx.client_request.query_options.isDisable_unsafe_spills() && !queryCtx.tables_missing_stats.isEmpty() && !analysisResult.getAnalyzer().hasPlanHints(); // for now, always disable spilling for multi-threaded execution if (isMtExec || disableSpilling) queryCtx.setDisable_spilling( true ); Disabling spilling with MT_DOP doesn't really make sense anymore because we only execute simple queries with MT_DOP. The fix is to remove that one line.
          Hide
          alex.behm Alexander Behm added a comment -

          commit 909be1dd47fb155f04d700fa9e7429c0f98fdc41
          Author: Alex Behm <alex.behm@cloudera.com>
          Date: Thu Feb 23 15:57:37 2017 -0800

          IMPALA-4981: Re-enable spilling with MT_DOP.

          The initial changes for MT_DOP disabled spilling because
          spilling is not yet designed/implemented for multi-threaded
          joins. However, we have since disallowed running queries
          that have non-local joins with MT_DOP.
          This patch re-enables spilling with MT_DOP.

          Change-Id: I86465896c6583be256e17c88a713da7dde25b540
          Reviewed-on: http://gerrit.cloudera.org:8080/6131
          Reviewed-by: Dan Hecht <dhecht@cloudera.com>
          Reviewed-by: Alex Behm <alex.behm@cloudera.com>
          Tested-by: Impala Public Jenkins

          Show
          alex.behm Alexander Behm added a comment - commit 909be1dd47fb155f04d700fa9e7429c0f98fdc41 Author: Alex Behm <alex.behm@cloudera.com> Date: Thu Feb 23 15:57:37 2017 -0800 IMPALA-4981 : Re-enable spilling with MT_DOP. The initial changes for MT_DOP disabled spilling because spilling is not yet designed/implemented for multi-threaded joins. However, we have since disallowed running queries that have non-local joins with MT_DOP. This patch re-enables spilling with MT_DOP. Change-Id: I86465896c6583be256e17c88a713da7dde25b540 Reviewed-on: http://gerrit.cloudera.org:8080/6131 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins
          Hide
          tarmstrong Tim Armstrong added a comment -

          The flakiness should be addressed when we add memory reservations

          Show
          tarmstrong Tim Armstrong added a comment - The flakiness should be addressed when we add memory reservations

            People

            • Assignee:
              alex.behm Alexander Behm
              Reporter:
              mikesbrown Michael Brown
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development