[ASTERIXDB-3083] Indexnl query rewrite results in gives different query semantics than default HHJ - ASF JIRA

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.9.3
Fix Version/s: 0.9.7
Component/s: IDX - Indexes
Labels:
None

Description

Currently, in order to use index-nested-loop-join algorithm via "indexnl" hint, it is required that the outer collection be cast to the type that is compatible with the index on the inner collection. This leads to a query that is semantically different from the hash-join algorithm and that (rightly) produces a different result. However, from a user perspective, using a hash-join or nested-loop join should give the same result. Here is an example:

bucket b1 has:
{"f": 1,"o": "f=1","id": 1}
{"f": 8,"o": "f=8","id": 2}
{"f": "1","o": "f=\"1\"","id": 3}

bucket b2 has:
{"f": 1,"o": "f=1","id": 1}
{"f": "1","o": "f=\"1\"","id": 2}
{"f": 9,"o": "f=9","id": 3}

create dataset ds1 on b1;
create dataset ds2 on b2;
create index idx_ds2 on ds2(f: int);
// hash-join
from ds1, ds2 where ds1.f = ds2.f select ds1, ds2;

[
  {
    "ds1": {
      "f": "1",
      "o": "f=\"1\"",
      "id": 3
    },
    "ds2": {
      "f": "1",
      "o": "f=\"1\"",
      "id": 2
    }
  },
  {
    "ds1": {
      "f": 1,
      "o": "f=1",
      "id": 1
    },
    "ds2": {
      "f": 1,
      "o": "f=1",
      "id": 1
    }
  }
]

// indexnl join where ds1 is the outer collection and ds2 is the inner
from ds1, ds2 where to_bigint(ds1.f) /*+ indexnl */ = ds2.f select ds1, ds2;

[
  {
    "ds1": {
      "f": 1,
      "o": "f=1",
      "id": 1
    },
    "ds2": {
      "f": 1,
      "o": "f=1",
      "id": 1
    }
  },
  {
    "ds1": {
      "f": "1",
      "o": "f=\"1\"",
      "id": 3
    },
    "ds2": {
      "f": 1,
      "o": "f=1",
      "id": 1
    }
  }
]

// the plan for the indexnl join
 distribute result [$$28]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
    project ([$$28])
    -- STREAM_PROJECT  |PARTITIONED|
      assign [$$28] <- [{"ds1": $$ds1, "ds2": $$ds2}]
      -- ASSIGN  |PARTITIONED|
        project ([$$ds1, $$ds2])
        -- STREAM_PROJECT  |PARTITIONED|
          select (eq($$33, $$ds2.getField("f")))
          -- STREAM_SELECT  |PARTITIONED|
            project ([$$ds1, $$33, $$ds2])
            -- STREAM_PROJECT  |PARTITIONED|
              exchange
              -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                unnest-map [$$31, $$ds2, $$32] <- index-search("ds2", 0, "Default", "ds2", true, false, 1, $$37, 1, $$37, true, true, true)
                -- BTREE_SEARCH  |PARTITIONED|
                  exchange
                  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                    order (ASC, $$37)
                    -- STABLE_SORT [$$37(ASC)]  |PARTITIONED|
                      exchange
                      -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                        project ([$$ds1, $$33, $$37])
                        -- STREAM_PROJECT  |PARTITIONED|
                          exchange
                          -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                            unnest-map [$$36, $$37] <- index-search("idx_ds2", 0, "Default", "ds2", true, true, 1, $$33, 1, $$33, true, true, true)
                            -- BTREE_SEARCH  |PARTITIONED|
                              exchange
                              -- BROADCAST_EXCHANGE  |PARTITIONED|
                                assign [$$33] <- [to-bigint($$ds1.getField("f"))]
                                -- ASSIGN  |PARTITIONED|
                                  project ([$$ds1])
                                  -- STREAM_PROJECT  |PARTITIONED|
                                    exchange
                                    -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                      data-scan []<-[$$29, $$ds1, $$30] <- Default.ds1
                                      -- DATASOURCE_SCAN  |PARTITIONED|
                                        exchange
                                        -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                          empty-tuple-source
                                          -- EMPTY_TUPLE_SOURCE  |PARTITIONED|

Attachments

Issue Links

relates to

ASTERIXDB-3082 Make *_default_null functions public

Open

Indexnl query rewrite results in gives different query semantics than default HHJ

Details

Description

Attachments

Issue Links

Activity

People

Dates