[ASTERIXDB-1186] Index Join Hint on an index with composite keys causes query to return incorrect results - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: *DB - AsterixDB, COMP - Compiler
Labels:
- soon
Environment:
AsterixDB 0.8.7-SNAPSHOT

Description

If one of the attributes in the join predicate is a composite PK (consists of more than one attribute), if the Index Join hint is used the query plan includes wrong exchange (partitioning) which results in wrong results (only a subset of the correct total results will be returned).
The root cause of it is that the records from the PK side (which is composite) are hash-partitioned on a "combination" of hash values of all attributes in the PK. As a result the records of the non-PK side need to be sent to "all" partitions, but the plan contains the wrong exchange and the runtime wont send them to all partitions.
Below is an example:

> Here is DDL:
create type LineItemType as {
l_orderkey: int64,
l_partkey: int64,
l_suppkey: int64,
l_linenumber: int32,
l_quantity: int32,
l_extendedprice: double,
l_discount: double,
l_tax: double,
l_returnflag: string,
l_linestatus: string,
l_shipdate: string,
l_commitdate: string,
l_receiptdate: string,
l_shipinstruct: string,
l_shipmode: string,
l_comment: string
}

create type OrderType as {
o_orderkey: int64,
o_custkey: int64,
o_orderstatus: string,
o_totalprice: double,
o_orderdate: string,
o_orderpriority: string,
o_clerk: string,
o_shippriority: int32,
o_comment: string
}

create dataset LineItem(LineItemType) primary key l_orderkey, l_linenumber;
create dataset Orders(OrderType) primary key o_orderkey;

> The following query returns wrong (a subset) of results:

for $o in dataset('Orders')
for $l in dataset('LineItem')
where
$l.l_orderkey /*+ indexnl */ = $o.o_orderkey
return{
"o_orderkey": $o.o_orderkey,
"l_orderkey": $l.l_orderkey
}

Here is the plan - As you can see the Orders record are One-on-One exchanged (rather than broadcast). The issue is that the l_orderkey is not the partitioning attribute of LineItem, and matching Orders for a specific order_key can be in any partition (depending on their l_linenumber value - which is the 2nd component of PK in LineItem)

{o_orderkey}

, %0->$$16, AString:

{l_orderkey}

{LineItem}, AInt32: {0}, AString: {dummy}, AString: {LineItem}

, ABoolean:

{true}, ABoolean: {false}, ABoolean: {true}

, AInt32:

{1}, %0->$$16, AInt32: {1}

Attachments

Activity

People

Assignee:: Taewoo Kim

Reporter:: Pouria Pirzadeh

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 19/Nov/15 19:33

Updated:: 17/Jun/17 00:00

Resolved:: 25/Aug/16 20:01