Apache Drill
  1. Apache Drill
  2. DRILL-801

merge joins fail with ArrayIndexOutOfBoundsException en masse

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: Execution - Flow
    • Labels:
      None

      Description

      Datasources: TPCH (10MB), three-way split parquet files
      git.commit.id.abbrev=5d7e3d3
      git.commit.id=5d7e3d3ab548eb2b23607df46ea843a9c1532b72

      All of the join queries in the smoke test suite with merge-join fail with ArrayIndexOutOfBoundsException. An example follows:

      0: jdbc:drill:schema=dfs.TpcHMulti> alter session set `planner.enable_hashjoin` = false;
      ----------------------+

      ok summary

      ----------------------+

      true planner.enable_hashjoin updated.

      ----------------------+
      1 row selected (0.024 seconds)
      0: jdbc:drill:schema=dfs.TpcHMulti> select o.O_TOTALPRICE, c.C_NAME
      . . . . . . . . . . . . . . . . . > from orders o, customer c
      . . . . . . . . . . . . . . . . . > where o.C_CUSTKEY = c.C_CUSTKEY and o.O_TOTALPRICE > 400000.00
      . . . . . . . . . . . . . . . . . > order by o.O_TOTALPRICE;
      Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while running query.[error_id: "3914508b-6c56-4598-a5aa-5d3f51885ded"
      endpoint

      { address: "perfnode104.perf.lab" user_port: 31010 control_port: 31011 data_port: 31012 }

      error_type: 0
      message: "Failure while running fragment. < ArrayIndexOutOfBoundsException:[ 16666 ]"
      ]
      Error: exception while executing query (state=,code=0)

      Physical plan:

      0: jdbc:drill:schema=dfs.TpcHMulti> explain plan for select o.O_TOTALPRICE, c.C_NAME
      . . . . . . . . . . . . . . . . . > from orders o, customer c
      . . . . . . . . . . . . . . . . . > where o.C_CUSTKEY = c.C_CUSTKEY and o.O_TOTALPRICE > 400000.00
      . . . . . . . . . . . . . . . . . > order by o.O_TOTALPRICE ;
      ----------------------+

      text json

      ----------------------+

      ScreenPrel
      SingleMergeExchangePrel(sort0=[0 ASC])
      SelectionVectorRemoverPrel
      SortPrel(sort0=[$0], dir0=[ASC])
      HashToRandomExchangePrel(dist0=[[$0]])
      ProjectPrel(O_TOTALPRICE=[$2], C_NAME=[$5])
      MergeJoinPrel(condition=[=($1, $4)], joinType=[inner])
      SelectionVectorRemoverPrel
      SortPrel(sort0=[$1], dir0=[ASC])
      HashToRandomExchangePrel(dist0=[[$1]])
      FilterPrel(condition=[>($2, 400000.00)])
      ScanPrel(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/orders]], selectionRoot=/drill/testdata/tpch-multi/orders, columns=[SchemaPath [`C_CUSTKEY`], SchemaPath [`O_TOTALPRICE`]]]])
      SelectionVectorRemoverPrel
      SortPrel(sort0=[$1], dir0=[ASC])
      HashToRandomExchangePrel(dist0=[[$1]])
      ScanPrel(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/customer]], selectionRoot=/drill/testdata/tpch-multi/customer, columns=[SchemaPath [`C_CUSTKEY`], SchemaPath [`C_NAME`]]]])
      {
      "head" :
      Unknown macro: { "version" }

      ,
      "graph" : [ {
      "pop" : "parquet-scan",
      "@id" : 1,
      "entries" : [

      { "path" : "maprfs:/drill/testdata/tpch-multi/customer" }

      ],
      "storage" : {
      "type" : "file",
      "connection" : "maprfs:///",
      "workspaces" :

      Unknown macro: { "root" }

      ,
      "formats" :

      Unknown macro: { "psv" }


      },
      "format" :

      { "type" : "parquet" }

      ,
      "columns" : [ "`C_CUSTKEY`", "`C_NAME`" ],
      "selectionRoot" : "/drill/testdata/tpch-multi/customer"
      },

      { "pop" : "hash-to-random-exchange", "@id" : 2, "child" : 1, "expr" : "hash(`C_CUSTKEY`) ", "initialAllocation" : 1000000, "maxAllocation" : 10000000000 }

      ,

      Unknown macro: { "pop" }

      ,

      { "pop" : "selection-vector-remover", "@id" : 4, "child" : 3, "initialAllocation" : 1000000, "maxAllocation" : 10000000000 }

      ,

      Unknown macro: { "pop" }

      , {
      "pop" : "parquet-scan",
      "@id" : 6,
      "entries" : [

      { "path" : "maprfs:/drill/testdata/tpch-multi/orders" }

      ],
      "storage" : {
      "type" : "file",
      "connection" : "maprfs:///",
      "workspaces" :

      Unknown macro: { "root" }

      ,
      "formats" :

      Unknown macro: { "psv" }


      },
      "format" :

      { "type" : "parquet" }

      ,
      "columns" : [ "`C_CUSTKEY`", "`O_TOTALPRICE`" ],
      "selectionRoot" : "/drill/testdata/tpch-multi/orders"
      },

      { "pop" : "filter", "@id" : 7, "child" : 6, "expr" : "greater_than(`O_TOTALPRICE`, 400000.0) ", "initialAllocation" : 1000000, "maxAllocation" : 10000000000 }

      ,

      { "pop" : "hash-to-random-exchange", "@id" : 8, "child" : 7, "expr" : "hash(`C_CUSTKEY`) ", "initialAllocation" : 1000000, "maxAllocation" : 10000000000 }

      ,

      Unknown macro: { "pop" }

      ,

      { "pop" : "selection-vector-remover", "@id" : 10, "child" : 9, "initialAllocation" : 1000000, "maxAllocation" : 10000000000 }

      ,

      Unknown macro: { "pop" }

      ,

      Unknown macro: { "pop" }

      ,

      { "pop" : "hash-to-random-exchange", "@id" : 13, "child" : 12, "expr" : "hash(`O_TOTALPRICE`) ", "initialAllocation" : 1000000, "maxAllocation" : 10000000000 }

      ,

      Unknown macro: { "pop" }

      ,

      { "pop" : "selection-vector-remover", "@id" : 15, "child" : 14, "initialAllocation" : 1000000, "maxAllocation" : 10000000000 }

      ,

      Unknown macro: { "pop" }

      ,

      { "pop" : "screen", "@id" : 17, "child" : 16, "initialAllocation" : 1000000, "maxAllocation" : 10000000000 }

      ]
      }

      ----------------------+
      1 row selected (0.151 seconds)

        Activity

        Hide
        Aman Sinha added a comment -

        This issue is not related to merge join as such. The IOBE occurs during a sort operation and can be reproduced with the following simplified query. Since merge join expects sorted input, you are encountering this issue for the join query.

        select n.n_regionkey from nation n order by n.n_regionkey;

        message: "Failure while running fragment. < IndexOutOfBoundsException:[ index: 24, length: 8 (expected: range(0, 8)) ]"

        Here's the stack trace:
        java.lang.IndexOutOfBoundsException: index: 24, length: 8 (expected: range(0, 8))
        io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1130) ~[netty-buffer-4.0.7.Final.jar:na]
        io.netty.buffer.AbstractByteBuf.getLong(AbstractByteBuf.java:391) ~[netty-buffer-4.0.7.Final.jar:na]
        org.apache.drill.exec.vector.BigIntVector$Accessor.get(BigIntVector.java:269) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
        org.apache.drill.exec.test.generated.MSorterGen84.doEval(MSortTemplate.java:53) ~[na:na]
        org.apache.drill.exec.test.generated.MSorterGen84.compare(MSortTemplate.java:137) ~[na:na]
        org.apache.drill.exec.test.generated.MSorterGen84.merge(MSortTemplate.java:76) ~[na:na]
        org.apache.drill.exec.test.generated.MSorterGen84.sort(MSortTemplate.java:111) ~[na:na]
        org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.next(ExternalSortBatch.java:268) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
        org.apache.drill.exec.record.AbstractSingleRecordBatch.next(AbstractSingleRecordBatch.java:45) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
        org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.next(RemovingRecordBatch.java:94) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
        org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.next(SingleSenderCreator.java:74) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
        org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:104) ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45]
        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45]
        java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]

        Show
        Aman Sinha added a comment - This issue is not related to merge join as such. The IOBE occurs during a sort operation and can be reproduced with the following simplified query. Since merge join expects sorted input, you are encountering this issue for the join query. select n.n_regionkey from nation n order by n.n_regionkey; message: "Failure while running fragment. < IndexOutOfBoundsException:[ index: 24, length: 8 (expected: range(0, 8)) ]" Here's the stack trace: java.lang.IndexOutOfBoundsException: index: 24, length: 8 (expected: range(0, 8)) io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1130) ~ [netty-buffer-4.0.7.Final.jar:na] io.netty.buffer.AbstractByteBuf.getLong(AbstractByteBuf.java:391) ~ [netty-buffer-4.0.7.Final.jar:na] org.apache.drill.exec.vector.BigIntVector$Accessor.get(BigIntVector.java:269) ~ [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] org.apache.drill.exec.test.generated.MSorterGen84.doEval(MSortTemplate.java:53) ~ [na:na] org.apache.drill.exec.test.generated.MSorterGen84.compare(MSortTemplate.java:137) ~ [na:na] org.apache.drill.exec.test.generated.MSorterGen84.merge(MSortTemplate.java:76) ~ [na:na] org.apache.drill.exec.test.generated.MSorterGen84.sort(MSortTemplate.java:111) ~ [na:na] org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.next(ExternalSortBatch.java:268) ~ [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] org.apache.drill.exec.record.AbstractSingleRecordBatch.next(AbstractSingleRecordBatch.java:45) ~ [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.next(RemovingRecordBatch.java:94) ~ [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.next(SingleSenderCreator.java:74) ~ [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:104) ~ [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45] java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
        Hide
        Zhiyong Liu added a comment -

        Despite your observations, it should be pointed out that hash joins with the same queries do NOT have the same symptom. In addition, there is a subtle difference in the exception statements in this and your case.

        Show
        Zhiyong Liu added a comment - Despite your observations, it should be pointed out that hash joins with the same queries do NOT have the same symptom. In addition, there is a subtle difference in the exception statements in this and your case.
        Hide
        Ramana Inukonda Nagaraj added a comment -

        The hash join passing is because there is no requirement for sorted data there. If you check the physical plan in this case there will be a sort before the join.

        Show
        Ramana Inukonda Nagaraj added a comment - The hash join passing is because there is no requirement for sorted data there. If you check the physical plan in this case there will be a sort before the join.
        Hide
        Steven Phillips added a comment -

        I am unable to reproduce this issue. Can you please try again on the latest build, and if you reproduce it, let me take a look at it.

        Show
        Steven Phillips added a comment - I am unable to reproduce this issue. Can you please try again on the latest build, and if you reproduce it, let me take a look at it.
        Hide
        Zhiyong Liu added a comment -

        It does appear to have gone away, apparently as a result of some fixes. Ran on the following build:

        git.commit.id.abbrev=5b8f8d8
        git.commit.id=5b8f8d8c76091817b2c2598b61b93374e9721668

        Also, Aman's query

        select n.n_regionkey from nation n order by n.n_regionkey;

        also executed without the exception.

        Show
        Zhiyong Liu added a comment - It does appear to have gone away, apparently as a result of some fixes. Ran on the following build: git.commit.id.abbrev=5b8f8d8 git.commit.id=5b8f8d8c76091817b2c2598b61b93374e9721668 Also, Aman's query select n.n_regionkey from nation n order by n.n_regionkey; also executed without the exception.
        Hide
        Mehant Baid added a comment -

        Please close this bug as fixed.

        Show
        Mehant Baid added a comment - Please close this bug as fixed.
        Hide
        Jacques Nadeau added a comment -

        fixed

        Show
        Jacques Nadeau added a comment - fixed

          People

          • Assignee:
            Jacques Nadeau
            Reporter:
            Zhiyong Liu
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development