Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
-
None
-
None
-
ghx-label-3
Description
The stress test compute stats statements recently took 9 hours to do a binary search.
The stress test cannot find a start point for mem_limit for compute stats statements, because explain is not supported.
[localhost:21000] > explain compute stats tpch.lineitem; Query: explain compute stats tpch.lineitem ERROR: AnalysisException: Syntax error in line 1: explain compute stats tpch.lineitem ^ Encountered: COMPUTE Expected: CREATE, DELETE, INSERT, SELECT, UPDATE, UPSERT, VALUES, WITH CAUSED BY: Exception: Syntax error [localhost:21000] >
The stress test has done this ever since it supported such:
1370 def estimate_query_mem_mb_usage(query, query_runner): 1371 """Runs an explain plan then extracts and returns the estimated memory needed to run 1372 the query. 1373 """ 1374 with query_runner.impalad_conn.cursor() as cursor: 1375 LOG.debug("Using %s database", query.db_name) 1376 if query.db_name: 1377 cursor.execute('USE ' + query.db_name) 1378 if query.query_type == QueryType.COMPUTE_STATS: 1379 # Running "explain" on compute stats is not supported by Impala. 1380 return
This means the stress test is starting with the full limit of impalad.
2018-03-17 08:00:38,684 12313 MainThread INFO:concurrent_select[1164]:Collecting runtime info for query compute_stats_call_center_mt_dop_1: COMPUTE STATS call_center 2018-03-17 08:00:38,925 12313 MainThread DEBUG:concurrent_select[1375]:Using tpcds_300_decimal_parquet database 2018-03-17 08:00:38,925 12313 MainThread DEBUG:db_connection[203]:IMPALA: USE tpcds_300_decimal_parquet 2018-03-17 08:00:39,007 12313 MainThread INFO:hiveserver2[265]:Closing active operation 2018-03-17 08:00:39,123 12313 MainThread INFO:concurrent_select[1247]:Finding a starting point for binary search 2018-03-17 08:00:39,148 12313 MainThread DEBUG:concurrent_select[866]:Using tpcds_300_decimal_parquet database 2018-03-17 08:00:39,148 12313 MainThread DEBUG:db_connection[203]:IMPALA: USE tpcds_300_decimal_parquet 2018-03-17 08:00:39,206 12313 MainThread DEBUG:db_connection[203]:IMPALA: SET MT_DOP=1 2018-03-17 08:00:39,333 12313 MainThread DEBUG:db_connection[203]:IMPALA: SET ABORT_ON_ERROR=1 2018-03-17 08:00:39,416 12313 MainThread DEBUG:concurrent_select[878]:Setting mem limit to 77308 MB 2018-03-17 08:00:39,416 12313 MainThread DEBUG:db_connection[203]:IMPALA: SET MEM_LIMIT=77308M 2018-03-17 08:00:39,503 12313 MainThread DEBUG:concurrent_select[882]:Running query with 77308 MB mem limit at vc0718.halxg.cloudera.com with timeout secs 9223372036854775807: COMPUTE STATS call_center 2018-03-17 08:00:39,741 12313 MainThread DEBUG:concurrent_select[890]:Query id is 3b4213033bf2359c:d44b29c500000000 2018-03-17 08:00:41,084 12313 MainThread INFO:hiveserver2[265]:Closing active operation 2018-03-17 08:00:41,202 12313 MainThread DEBUG:concurrent_select[1209]:Spilled: False 2018-03-17 08:00:41,202 12313 MainThread INFO:concurrent_select[1267]:Finding minimum memory required to avoid spilling 2018-03-17 08:00:41,227 12313 MainThread DEBUG:concurrent_select[866]:Using tpcds_300_decimal_parquet database 2018-03-17 08:00:41,227 12313 MainThread DEBUG:db_connection[203]:IMPALA: USE tpcds_300_decimal_parquet 2018-03-17 08:00:41,286 12313 MainThread DEBUG:db_connection[203]:IMPALA: SET MT_DOP=1 2018-03-17 08:00:41,367 12313 MainThread DEBUG:db_connection[203]:IMPALA: SET ABORT_ON_ERROR=1 2018-03-17 08:00:41,449 12313 MainThread DEBUG:concurrent_select[878]:Setting mem limit to 38654 MB 2018-03-17 08:00:41,449 12313 MainThread DEBUG:db_connection[203]:IMPALA: SET MEM_LIMIT=38654M 2018-03-17 08:00:41,530 12313 MainThread DEBUG:concurrent_select[882]:Running query with 38654 MB mem limit at vc0718.halxg.cloudera.com with timeout secs 9223372036854775807: COMPUTE STATS call_center 2018-03-17 08:00:41,589 12313 MainThread DEBUG:concurrent_select[890]:Query id is 74db40c3f221cf3:d67997c00000000 2018-03-17 08:00:42,184 12313 MainThread INFO:hiveserver2[265]:Closing active operation
This has always been the case, but no one really looked into it until now.
It's important to get this fixed soon as we increase where our stress tests run. Before, it was a very infrequent cost, but at least in my downstream environment, that is rapidly changing.