[IMPALA-20] Aggregate of a subquery result set returns wrong results if the subquery contains a 'limit' and data is distributed across multiple nodes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: Impala 0.3
Fix Version/s: Impala 0.7
Component/s: None
Labels:
None

Description

Aggregate of a subquery result set returns wrong results if the subquery contains a 'limit' clause and data is distributed across multiple nodes. From the query plan, it looks like we are just summing the results from each slave.

Example, if the data spread across 3 nodes (expected result is 10):

> select count(*) from (select * from tpch.lineitem limit 10) p
Query finished, fetching results ...
30
Returned 1 row(s) in 0.08s

Plan

 
  UNPARTITIONED
  AGGREGATE
  OUTPUT: SUM(<slot 32>)
  GROUP BY:
  TUPLE IDS: 2
    EXCHANGE (2)
      TUPLE IDS: 2

Plan Fragment 1
  RANDOM
  STREAM DATA SINK
    EXCHANGE ID: 2
    UNPARTITIONED

  AGGREGATE
  OUTPUT: COUNT(*)
  GROUP BY:
  TUPLE IDS: 2
    SCAN HDFS table=tpch.lineitem #partitions=1 size=718.94MB (0)
      LIMIT: 10
      TUPLE IDS: 0

Attachments

Activity

People

Assignee:: Marcel Kinard

Reporter:: Lenni Kuff

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 17/Jan/13 00:08

Updated:: 20/Dec/15 00:04

Resolved:: 10/Mar/13 01:29