Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
-
None
-
AsterixDB 0.8.7-SNAPSHOT
Description
If one of the attributes in the join predicate is a composite PK (consists of more than one attribute), if the Index Join hint is used the query plan includes wrong exchange (partitioning) which results in wrong results (only a subset of the correct total results will be returned).
The root cause of it is that the records from the PK side (which is composite) are hash-partitioned on a "combination" of hash values of all attributes in the PK. As a result the records of the non-PK side need to be sent to "all" partitions, but the plan contains the wrong exchange and the runtime wont send them to all partitions.
Below is an example:
> Here is DDL:
create type LineItemType as {
l_orderkey: int64,
l_partkey: int64,
l_suppkey: int64,
l_linenumber: int32,
l_quantity: int32,
l_extendedprice: double,
l_discount: double,
l_tax: double,
l_returnflag: string,
l_linestatus: string,
l_shipdate: string,
l_commitdate: string,
l_receiptdate: string,
l_shipinstruct: string,
l_shipmode: string,
l_comment: string
}
create type OrderType as {
o_orderkey: int64,
o_custkey: int64,
o_orderstatus: string,
o_totalprice: double,
o_orderdate: string,
o_orderpriority: string,
o_clerk: string,
o_shippriority: int32,
o_comment: string
}
create dataset LineItem(LineItemType) primary key l_orderkey, l_linenumber;
create dataset Orders(OrderType) primary key o_orderkey;
> The following query returns wrong (a subset) of results:
for $o in dataset('Orders')
for $l in dataset('LineItem')
where
$l.l_orderkey /*+ indexnl */ = $o.o_orderkey
return{
"o_orderkey": $o.o_orderkey,
"l_orderkey": $l.l_orderkey
}
Here is the plan - As you can see the Orders record are One-on-One exchanged (rather than broadcast). The issue is that the l_orderkey is not the partitioning attribute of LineItem, and matching Orders for a specific order_key can be in any partition (depending on their l_linenumber value - which is the 2nd component of PK in LineItem)
distribute result [%0->$$13]
– DISTRIBUTE_RESULT |UNPARTITIONED|
exchange
– ONE_TO_ONE_EXCHANGE |UNPARTITIONED|
aggregate [$$13] <- [function-call: asterix:agg-sum, Args:[%0->$$19]]
– AGGREGATE |UNPARTITIONED|
exchange
– RANDOM_MERGE_EXCHANGE |PARTITIONED|
aggregate [$$19] <- [function-call: asterix:agg-count, Args:[%0->$$9]]
– AGGREGATE |PARTITIONED|
project ([$$9])
– STREAM_PROJECT |PARTITIONED|
assign [$$9] <- [function-call: asterix:closed-record-constructor, Args:[AString:
, %0->$$16, AString:
{l_orderkey}, %0->$$17]]
– ASSIGN |PARTITIONED|
project ([$$17, $$16])
– STREAM_PROJECT |PARTITIONED|
exchange
– ONE_TO_ONE_EXCHANGE |PARTITIONED|
unnest-map [$$17, $$18, $$1] <- function-call: asterix:index-search, Args:[AString:
, ABoolean:
{true}, ABoolean: {false}, ABoolean: {true}, AInt32:
{1}, %0->$$16, AInt32: {1}, %0->$$16, TRUE, TRUE, TRUE]
– BTREE_SEARCH |PARTITIONED|
exchange
– ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$16])
– STREAM_PROJECT |PARTITIONED|
exchange
– ONE_TO_ONE_EXCHANGE |PARTITIONED|
data-scan []<[$$16, $$0] < dummy:Orders
– DATASOURCE_SCAN |PARTITIONED|
exchange
– ONE_TO_ONE_EXCHANGE |PARTITIONED|
empty-tuple-source
– EMPTY_TUPLE_SOURCE |PARTITIONED|