Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4715

Aggregate functions on nested type sometimes returns wrong results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Duplicate
    • Impala 2.6.0
    • None
    • Backend
    • Operating system: CentOS Linux release 7.2.1511 (Core)
      Server version: impalad version 2.6.0-cdh5.8.3 RELEASE (build c644f476b774db9db87a619628f7a6ecc5f843e0)

    Description

      This is definition of my table:

      CREATE TABLE event (
        user_id BIGINT,
        event_id INT,
        event_type_id INT,
        ts_bigint BIGINT,
        processed_ts_bigint BIGINT,
        event_map MAP<STRING, STRING>,
        server_id INT
      )
      PARTITIONED BY (date_id INT)
      STORED AS PARQUET;
      

      I want to run this query:

      WITH
      battles AS (
      SELECT
        *
      FROM event e,
        (SELECT
          MAX(CASE WHEN key='won' THEN value END) AS won
        FROM e.event_map) m
      WHERE e.date_id = 2420 AND e.event_id = 10012
      )
      SELECT won, count(*) FROM battles
      GROUP BY won;
      
      

      Map element with key = 'won' has only 'true' and 'false' values. If battles subquery has more than 5 million rows the query returns strange results, and when it has less rows query returns correct data.

      This is the query plan:

      +------------------------------------------------------------------------------------+
      | Explain String                                                                     |
      +------------------------------------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=882.00MB VCores=2                          |
      | WARNING: The following tables are missing relevant table and/or column statistics. |
      | dejanp.event                                                                       |
      |                                                                                    |
      | 09:EXCHANGE [UNPARTITIONED]                                                        |
      | |                                                                                  |
      | 08:AGGREGATE [FINALIZE]                                                            |
      | |  output: count:merge(*)                                                          |
      | |  group by: won                                                                   |
      | |                                                                                  |
      | 07:EXCHANGE [HASH(won)]                                                            |
      | |                                                                                  |
      | 06:AGGREGATE [STREAMING]                                                           |
      | |  output: count(*)                                                                |
      | |  group by: max(CASE WHEN key = 'won' THEN value END)                             |
      | |                                                                                  |
      | 01:SUBPLAN                                                                         |
      | |                                                                                  |
      | |--05:NESTED LOOP JOIN [CROSS JOIN]                                                |
      | |  |                                                                               |
      | |  |--02:SINGULAR ROW SRC                                                          |
      | |  |                                                                               |
      | |  04:AGGREGATE [FINALIZE]                                                         |
      | |  |  output: max(CASE WHEN key = 'won' THEN value END)                            |
      | |  |                                                                               |
      | |  03:UNNEST [e.event_map]                                                         |
      | |                                                                                  |
      | 00:SCAN HDFS [dejanp.event e]                                                      |
      |    partitions=1/1 files=64 size=18.76GB                                            |
      |    predicates: e.event_id = 10012                                                  |
      +------------------------------------------------------------------------------------+
      

      Attachments

        1. incorrect_version.txt
          49 kB
          Dejan Prokić
        2. correct_version.txt
          44 kB
          Dejan Prokić

        Issue Links

          Activity

            People

              alex.behm Alexander Behm
              Prokić_impala_1760 Dejan Prokić
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: