Uploaded image for project: 'Apache AsterixDB'
  1. Apache AsterixDB
  2. ASTERIXDB-1186

Index Join Hint on an index with composite keys causes query to return incorrect results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • None
    • AsterixDB 0.8.7-SNAPSHOT

    Description

      If one of the attributes in the join predicate is a composite PK (consists of more than one attribute), if the Index Join hint is used the query plan includes wrong exchange (partitioning) which results in wrong results (only a subset of the correct total results will be returned).
      The root cause of it is that the records from the PK side (which is composite) are hash-partitioned on a "combination" of hash values of all attributes in the PK. As a result the records of the non-PK side need to be sent to "all" partitions, but the plan contains the wrong exchange and the runtime wont send them to all partitions.
      Below is an example:

      > Here is DDL:
      create type LineItemType as {
      l_orderkey: int64,
      l_partkey: int64,
      l_suppkey: int64,
      l_linenumber: int32,
      l_quantity: int32,
      l_extendedprice: double,
      l_discount: double,
      l_tax: double,
      l_returnflag: string,
      l_linestatus: string,
      l_shipdate: string,
      l_commitdate: string,
      l_receiptdate: string,
      l_shipinstruct: string,
      l_shipmode: string,
      l_comment: string
      }

      create type OrderType as {
      o_orderkey: int64,
      o_custkey: int64,
      o_orderstatus: string,
      o_totalprice: double,
      o_orderdate: string,
      o_orderpriority: string,
      o_clerk: string,
      o_shippriority: int32,
      o_comment: string
      }

      create dataset LineItem(LineItemType) primary key l_orderkey, l_linenumber;
      create dataset Orders(OrderType) primary key o_orderkey;

      > The following query returns wrong (a subset) of results:

      for $o in dataset('Orders')
      for $l in dataset('LineItem')
      where
      $l.l_orderkey /*+ indexnl */ = $o.o_orderkey
      return{
      "o_orderkey": $o.o_orderkey,
      "l_orderkey": $l.l_orderkey
      }

      Here is the plan - As you can see the Orders record are One-on-One exchanged (rather than broadcast). The issue is that the l_orderkey is not the partitioning attribute of LineItem, and matching Orders for a specific order_key can be in any partition (depending on their l_linenumber value - which is the 2nd component of PK in LineItem)

      distribute result [%0->$$13]
      – DISTRIBUTE_RESULT |UNPARTITIONED|
      exchange
      – ONE_TO_ONE_EXCHANGE |UNPARTITIONED|
      aggregate [$$13] <- [function-call: asterix:agg-sum, Args:[%0->$$19]]
      – AGGREGATE |UNPARTITIONED|
      exchange
      – RANDOM_MERGE_EXCHANGE |PARTITIONED|
      aggregate [$$19] <- [function-call: asterix:agg-count, Args:[%0->$$9]]
      – AGGREGATE |PARTITIONED|
      project ([$$9])
      – STREAM_PROJECT |PARTITIONED|
      assign [$$9] <- [function-call: asterix:closed-record-constructor, Args:[AString:

      {o_orderkey}

      , %0->$$16, AString:

      {l_orderkey}

      , %0->$$17]]
      – ASSIGN |PARTITIONED|
      project ([$$17, $$16])
      – STREAM_PROJECT |PARTITIONED|
      exchange
      – ONE_TO_ONE_EXCHANGE |PARTITIONED|
      unnest-map [$$17, $$18, $$1] <- function-call: asterix:index-search, Args:[AString:

      {LineItem}, AInt32: {0}, AString: {dummy}, AString: {LineItem}

      , ABoolean:

      {true}, ABoolean: {false}, ABoolean: {true}

      , AInt32:

      {1}, %0->$$16, AInt32: {1}

      , %0->$$16, TRUE, TRUE, TRUE]
      – BTREE_SEARCH |PARTITIONED|
      exchange
      – ONE_TO_ONE_EXCHANGE |PARTITIONED|
      project ([$$16])
      – STREAM_PROJECT |PARTITIONED|
      exchange
      – ONE_TO_ONE_EXCHANGE |PARTITIONED|
      data-scan []<[$$16, $$0] < dummy:Orders
      – DATASOURCE_SCAN |PARTITIONED|
      exchange
      – ONE_TO_ONE_EXCHANGE |PARTITIONED|
      empty-tuple-source
      – EMPTY_TUPLE_SOURCE |PARTITIONED|

      Attachments

        Activity

          People

            wangsaeu Taewoo Kim
            pouria Pouria Pirzadeh
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: