Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Duplicate
-
Impala 2.6.0
-
None
-
Operating system: CentOS Linux release 7.2.1511 (Core)
Server version: impalad version 2.6.0-cdh5.8.3 RELEASE (build c644f476b774db9db87a619628f7a6ecc5f843e0)
Description
This is definition of my table:
CREATE TABLE event ( user_id BIGINT, event_id INT, event_type_id INT, ts_bigint BIGINT, processed_ts_bigint BIGINT, event_map MAP<STRING, STRING>, server_id INT ) PARTITIONED BY (date_id INT) STORED AS PARQUET;
I want to run this query:
WITH battles AS ( SELECT * FROM event e, (SELECT MAX(CASE WHEN key='won' THEN value END) AS won FROM e.event_map) m WHERE e.date_id = 2420 AND e.event_id = 10012 ) SELECT won, count(*) FROM battles GROUP BY won;
Map element with key = 'won' has only 'true' and 'false' values. If battles subquery has more than 5 million rows the query returns strange results, and when it has less rows query returns correct data.
This is the query plan:
+------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=882.00MB VCores=2 | | WARNING: The following tables are missing relevant table and/or column statistics. | | dejanp.event | | | | 09:EXCHANGE [UNPARTITIONED] | | | | | 08:AGGREGATE [FINALIZE] | | | output: count:merge(*) | | | group by: won | | | | | 07:EXCHANGE [HASH(won)] | | | | | 06:AGGREGATE [STREAMING] | | | output: count(*) | | | group by: max(CASE WHEN key = 'won' THEN value END) | | | | | 01:SUBPLAN | | | | | |--05:NESTED LOOP JOIN [CROSS JOIN] | | | | | | | |--02:SINGULAR ROW SRC | | | | | | | 04:AGGREGATE [FINALIZE] | | | | output: max(CASE WHEN key = 'won' THEN value END) | | | | | | | 03:UNNEST [e.event_map] | | | | | 00:SCAN HDFS [dejanp.event e] | | partitions=1/1 files=64 size=18.76GB | | predicates: e.event_id = 10012 | +------------------------------------------------------------------------------------+
Attachments
Attachments
Issue Links
- duplicates
-
IMPALA-4539 Parquet scanner memory bug: I/O buffer is attached to output batch while scratch batch rows still reference it
-
- Resolved
-