Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Not A Bug
-
Impala 2.2
-
None
-
None
-
CDH 5.4.0, CentOS 6.6
Description
Even a simple select count(*) from a large table (without table stats) will run forever. According the query plan, Impala is scanning the table with only one node, one thread. The whole cluster is almost idle. Meanwhile, compute stats is also not possible due to the same reason. That makes it impossible to work with large tables.
The problem appears after upgrading to CDH 5.4.0 (Impala 2.2). Here is the log from the coordinator node:
I0506 14:39:33.550662 174296 impala-beeswax-server.cc:171] query(): query=select count(*) from MY_TABLE_NAME I0506 14:39:33.550693 174296 impala-beeswax-server.cc:514] query: Query { 01: query (string) = "select count(*) from MY_TABLE_NAME", 03: configuration (list) = list<string>[0] { }, 04: hadoop_user (string) = "hdfs", } I0506 14:39:33.550734 174296 impala-beeswax-server.cc:536] TClientRequest.queryOptions: TQueryOptions { 01: abort_on_error (bool) = false, 02: max_errors (i32) = 0, 03: disable_codegen (bool) = false, 04: batch_size (i32) = 0, 05: num_nodes (i32) = 0, 06: max_scan_range_length (i64) = 0, 07: num_scanner_threads (i32) = 0, 08: max_io_buffers (i32) = 0, 09: allow_unsupported_formats (bool) = false, 10: default_order_by_limit (i64) = -1, 11: debug_action (string) = "", 12: mem_limit (i64) = 0, 13: abort_on_default_limit_exceeded (bool) = false, 15: hbase_caching (i32) = 0, 16: hbase_cache_blocks (bool) = false, 17: parquet_file_size (i64) = 0, 18: explain_level (i32) = 1, 19: sync_ddl (bool) = false, 23: disable_cached_reads (bool) = false, 24: disable_outermost_topn (bool) = false, 25: rm_initial_mem (i64) = 0, 26: query_timeout_s (i32) = 0, 28: appx_count_distinct (bool) = false, 29: disable_unsafe_spills (bool) = false, 31: exec_single_node_rows_threshold (i32) = 100, } I0506 14:39:33.552515 174296 Frontend.java:766] analyze query select count(*) from MY_TABLE_NAME I0506 14:39:33.553094 174296 Frontend.java:715] Requesting prioritized load of table(s): MY_DB_NAME.MY_TABLE_NAME I0506 14:40:00.284970 174296 AggregateInfo.java:158] agg info: AggregateInfo{grouping_exprs=, aggregate_exprs=FunctionCallExpr{name=count, isStar=true, isDistinct=false, }, intermediate_tuple=TupleDescriptor{id=1, name=agg-tuple-intermed, tbl=null, byte_size=0, is_materialized=true, slots=[SlotDescriptor{id=0, col=null, type=BIGINT, materialized=false, byteSize=0, byteOffset=-1, nullIndicatorByte=0, nullIndicatorBit=0, slotIdx=0, stats=ColumnStats{avgSerializedSize_=8.0, maxSize_=8, numDistinct_=1, numNulls_=-1}}]}, output_tuple=TupleDescriptor{id=1, name=agg-tuple-intermed, tbl=null, byte_size=0, is_materialized=true, slots=[SlotDescriptor{id=0, col=null, type=BIGINT, materialized=false, byteSize=0, byteOffset=-1, nullIndicatorByte=0, nullIndicatorBit=0, slotIdx=0, stats=ColumnStats{avgSerializedSize_=8.0, maxSize_=8, numDistinct_=1, numNulls_=-1}}]}}AggregateInfo{phase=FIRST, intermediate_smap=smap(count(*):count(*) (FunctionCallExpr{name=count, isStar=true, isDistinct=false, }:SlotRef{tblName=null, type=BIGINT, col=null, id=0})), output_smap=smap(count(*):count(*) (FunctionCallExpr{name=count, isStar=true, isDistinct=false, }:SlotRef{tblName=null, type=BIGINT, col=null, id=0}))} mergeAggInfo: AggregateInfo{grouping_exprs=, aggregate_exprs=FunctionCallExpr{name=count, isStar=false, isDistinct=false, SlotRef{tblName=null, type=BIGINT, col=null, id=0}}, intermediate_tuple=TupleDescriptor{id=1, name=agg-tuple-intermed, tbl=null, byte_size=0, is_materialized=true, slots=[SlotDescriptor{id=0, col=null, type=BIGINT, materialized=false, byteSize=0, byteOffset=-1, nullIndicatorByte=0, nullIndicatorBit=0, slotIdx=0, stats=ColumnStats{avgSerializedSize_=8.0, maxSize_=8, numDistinct_=1, numNulls_=-1}}]}, output_tuple=TupleDescriptor{id=1, name=agg-tuple-intermed, tbl=null, byte_size=0, is_materialized=true, slots=[SlotDescriptor{id=0, col=null, type=BIGINT, materialized=false, byteSize=0, byteOffset=-1, nullIndicatorByte=0, nullIndicatorBit=0, slotIdx=0, stats=ColumnStats{avgSerializedSize_=8.0, maxSize_=8, numDistinct_=1, numNulls_=-1}}]}}AggregateInfo{phase=FIRST_MERGE, intermediate_smap=smap(count(*):count(*) (FunctionCallExpr{name=count, isStar=true, isDistinct=false, }:SlotRef{tblName=null, type=BIGINT, col=null, id=0})), output_smap=smap(count(*):count(*) (FunctionCallExpr{name=count, isStar=true, isDistinct=false, }:SlotRef{tblName=null, type=BIGINT, col=null, id=0}))} I0506 14:40:00.286416 174296 SelectStmt.java:262] post-analysis AggregateInfo{grouping_exprs=, aggregate_exprs=FunctionCallExpr{name=count, isStar=true, isDistinct=false, }, intermediate_tuple=TupleDescriptor{id=1, name=agg-tuple-intermed, tbl=null, byte_size=0, is_materialized=true, slots=[SlotDescriptor{id=0, col=null, type=BIGINT, materialized=false, byteSize=0, byteOffset=-1, nullIndicatorByte=0, nullIndicatorBit=0, slotIdx=0, stats=ColumnStats{avgSerializedSize_=8.0, maxSize_=8, numDistinct_=1, numNulls_=-1}}]}, output_tuple=TupleDescriptor{id=1, name=agg-tuple-intermed, tbl=null, byte_size=0, is_materialized=true, slots=[SlotDescriptor{id=0, col=null, type=BIGINT, materialized=false, byteSize=0, byteOffset=-1, nullIndicatorByte=0, nullIndicatorBit=0, slotIdx=0, stats=ColumnStats{avgSerializedSize_=8.0, maxSize_=8, numDistinct_=1, numNulls_=-1}}]}}AggregateInfo{phase=FIRST, intermediate_smap=smap(count(*):count(*) (FunctionCallExpr{name=count, isStar=true, isDistinct=false, }:SlotRef{tblName=null, type=BIGINT, col=null, id=0})), output_smap=smap(count(*):count(*) (FunctionCallExpr{name=count, isStar=true, isDistinct=false, }:SlotRef{tblName=null, type=BIGINT, col=null, id=0}))} mergeAggInfo: AggregateInfo{grouping_exprs=, aggregate_exprs=FunctionCallExpr{name=count, isStar=false, isDistinct=false, SlotRef{tblName=null, type=BIGINT, col=null, id=0}}, intermediate_tuple=TupleDescriptor{id=1, name=agg-tuple-intermed, tbl=null, byte_size=0, is_materialized=true, slots=[SlotDescriptor{id=0, col=null, type=BIGINT, materialized=false, byteSize=0, byteOffset=-1, nullIndicatorByte=0, nullIndicatorBit=0, slotIdx=0, stats=ColumnStats{avgSerializedSize_=8.0, maxSize_=8, numDistinct_=1, numNulls_=-1}}]}, output_tuple=TupleDescriptor{id=1, name=agg-tuple-intermed, tbl=null, byte_size=0, is_materialized=true, slots=[SlotDescriptor{id=0, col=null, type=BIGINT, materialized=false, byteSize=0, byteOffset=-1, nullIndicatorByte=0, nullIndicatorBit=0, slotIdx=0, stats=ColumnStats{avgSerializedSize_=8.0, maxSize_=8, numDistinct_=1, numNulls_=-1}}]}}AggregateInfo{phase=FIRST_MERGE, intermediate_smap=smap(count(*):count(*) (FunctionCallExpr{name=count, isStar=true, isDistinct=false, }:SlotRef{tblName=null, type=BIGINT, col=null, id=0})), output_smap=smap(count(*):count(*) (FunctionCallExpr{name=count, isStar=true, isDistinct=false, }:SlotRef{tblName=null, type=BIGINT, col=null, id=0}))} I0506 14:40:00.286613 174296 Frontend.java:840] create plan I0506 14:40:00.287134 174296 HdfsScanNode.java:579] collecting partitions for table MY_TABLE_NAME I0506 14:40:00.287358 174296 HdfsScanNode.java:616] computeStats HdfsScan: cardinality_=0 I0506 14:40:00.287448 174296 HdfsScanNode.java:622] computeStats HdfsScan: #nodes=1 I0506 14:40:00.308394 174296 Planner.java:104] desctbl: tuples: TupleDescriptor{id=0, name=basetbl, tbl=MY_DB_NAME.MY_TABLE_NAME, byte_size=0, is_materialized=true, slots=[]} TupleDescriptor{id=1, name=agg-tuple-intermed, tbl=null, byte_size=8, is_materialized=true, slots=[SlotDescriptor{id=0, col=null, type=BIGINT, materialized=true, byteSize=8, byteOffset=0, nullIndicatorByte=0, nullIndicatorBit=-1, slotIdx=0, stats=ColumnStats{avgSerializedSize_=8.0, maxSize_=8, numDistinct_=1, numNulls_=-1}}]} I0506 14:40:00.308629 174296 Planner.java:105] resultexprs: SlotRef{tblName=null, type=BIGINT, col=null, id=0} I0506 14:40:00.308743 174296 Planner.java:106] finalize plan fragments I0506 14:40:00.310184 174296 Frontend.java:866] get scan range locations I0506 14:40:00.310396 174296 Planner.java:249] Estimated per-host peak memory requirement: 0 I0506 14:40:00.310509 174296 Planner.java:250] Estimated per-host virtual cores requirement: 0 I0506 14:40:00.313704 174296 Frontend.java:935] create result set metadata I0506 14:40:00.313846 174296 JniFrontend.java:147] Estimated Per-Host Requirements: Memory=0B VCores=0 F00:PLAN FRAGMENT [UNPARTITIONED] 01:AGGREGATE [FINALIZE] | output: count(*) | hosts=1 per-host-mem=unavailable | tuple-ids=1 row-size=8B cardinality=0 | 00:SCAN HDFS [MY_DB_NAME.MY_TABLE_NAME] partitions=1/1 files=11999 size=3.44TB table stats: 0 rows total column stats: all hosts=1 per-host-mem=unavailable tuple-ids=0 row-size=0B cardinality=0 I0506 14:40:00.606078 174296 coordinator.cc:309] Exec() query_id=4742edd0ce404022:6f85c7ff2ec352ac I0506 14:40:00.615736 174296 plan-fragment-executor.cc:82] Prepare(): query_id=4742edd0ce404022:6f85c7ff2ec352ac instance_id=4742edd0ce404022:6f85c7ff2ec352ad I0506 14:40:00.615943 174296 plan-fragment-executor.cc:190] descriptor table for fragment=4742edd0ce404022:6f85c7ff2ec352ad tuples: Tuple(id=0 size=0 slots=[]) Tuple(id=1 size=8 slots=[Slot(id=0 type=BIGINT col_path=[] offset=0 null=(offset=0 mask=0) slot_idx=0 field_idx=-1)]) I0506 14:40:00.912346 174296 coordinator.cc:385] starting 0 backends for query 4742edd0ce404022:6f85c7ff2ec352ac I0506 14:40:00.918092 176093 plan-fragment-executor.cc:300] Open(): instance_id=4742edd0ce404022:6f85c7ff2ec352ad
Attachments
Issue Links
- is related to
-
IMPALA-1983 Impala should handle corrupted table stats better
- Resolved