Uploaded image for project: 'Apache AsterixDB'
  1. Apache AsterixDB
  2. ASTERIXDB-3083

Indexnl query rewrite results in gives different query semantics than default HHJ

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.9.3
    • 0.9.7
    • IDX - Indexes
    • None

    Description

      Currently, in order to use index-nested-loop-join algorithm via "indexnl" hint, it is required that the outer collection be cast to the type that is compatible with the index on the inner collection. This leads to a query that is semantically different from the hash-join algorithm and that (rightly) produces a different result. However, from a user perspective, using a hash-join or nested-loop join should give the same result. Here is an example:

      bucket b1 has:
      {"f": 1,"o": "f=1","id": 1}
      {"f": 8,"o": "f=8","id": 2}
      {"f": "1","o": "f=\"1\"","id": 3}
      
      bucket b2 has:
      {"f": 1,"o": "f=1","id": 1}
      {"f": "1","o": "f=\"1\"","id": 2}
      {"f": 9,"o": "f=9","id": 3}
      
      create dataset ds1 on b1;
      create dataset ds2 on b2;
      create index idx_ds2 on ds2(f: int);
      // hash-join
      from ds1, ds2 where ds1.f = ds2.f select ds1, ds2;
      
      [
        {
          "ds1": {
            "f": "1",
            "o": "f=\"1\"",
            "id": 3
          },
          "ds2": {
            "f": "1",
            "o": "f=\"1\"",
            "id": 2
          }
        },
        {
          "ds1": {
            "f": 1,
            "o": "f=1",
            "id": 1
          },
          "ds2": {
            "f": 1,
            "o": "f=1",
            "id": 1
          }
        }
      ]
      
      // indexnl join where ds1 is the outer collection and ds2 is the inner
      from ds1, ds2 where to_bigint(ds1.f) /*+ indexnl */ = ds2.f select ds1, ds2;
      
      [
        {
          "ds1": {
            "f": 1,
            "o": "f=1",
            "id": 1
          },
          "ds2": {
            "f": 1,
            "o": "f=1",
            "id": 1
          }
        },
        {
          "ds1": {
            "f": "1",
            "o": "f=\"1\"",
            "id": 3
          },
          "ds2": {
            "f": 1,
            "o": "f=1",
            "id": 1
          }
        }
      ]
      
      // the plan for the indexnl join
       distribute result [$$28]
      -- DISTRIBUTE_RESULT  |PARTITIONED|
        exchange
        -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
          project ([$$28])
          -- STREAM_PROJECT  |PARTITIONED|
            assign [$$28] <- [{"ds1": $$ds1, "ds2": $$ds2}]
            -- ASSIGN  |PARTITIONED|
              project ([$$ds1, $$ds2])
              -- STREAM_PROJECT  |PARTITIONED|
                select (eq($$33, $$ds2.getField("f")))
                -- STREAM_SELECT  |PARTITIONED|
                  project ([$$ds1, $$33, $$ds2])
                  -- STREAM_PROJECT  |PARTITIONED|
                    exchange
                    -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                      unnest-map [$$31, $$ds2, $$32] <- index-search("ds2", 0, "Default", "ds2", true, false, 1, $$37, 1, $$37, true, true, true)
                      -- BTREE_SEARCH  |PARTITIONED|
                        exchange
                        -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                          order (ASC, $$37)
                          -- STABLE_SORT [$$37(ASC)]  |PARTITIONED|
                            exchange
                            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                              project ([$$ds1, $$33, $$37])
                              -- STREAM_PROJECT  |PARTITIONED|
                                exchange
                                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                  unnest-map [$$36, $$37] <- index-search("idx_ds2", 0, "Default", "ds2", true, true, 1, $$33, 1, $$33, true, true, true)
                                  -- BTREE_SEARCH  |PARTITIONED|
                                    exchange
                                    -- BROADCAST_EXCHANGE  |PARTITIONED|
                                      assign [$$33] <- [to-bigint($$ds1.getField("f"))]
                                      -- ASSIGN  |PARTITIONED|
                                        project ([$$ds1])
                                        -- STREAM_PROJECT  |PARTITIONED|
                                          exchange
                                          -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                            data-scan []<-[$$29, $$ds1, $$30] <- Default.ds1
                                            -- DATASOURCE_SCAN  |PARTITIONED|
                                              exchange
                                              -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                                empty-tuple-source
                                                -- EMPTY_TUPLE_SOURCE  |PARTITIONED|

      Attachments

        Issue Links

          Activity

            People

              alsuliman Ali Alsuliman
              alsuliman Ali Alsuliman
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: