Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26350

IndexOutOfBoundsException when generating splits for external JDBC table with partition columns

    XMLWordPrintableJSON

Details

    Description

      Create the following table in some JDBC database (e.g., Postgres).

      CREATE TABLE country
      (
          id   int,
          name varchar(20)
      );
      

      Create the following tables in Hive ensuring that the external JDBC table has the hive.sql.partitionColumn table property set.

      CREATE TABLE city (id int);
      
      CREATE EXTERNAL TABLE country
      (
          id int,
          name varchar(20)
      )
      STORED BY                                          
      'org.apache.hive.storage.jdbc.JdbcStorageHandler'
      TBLPROPERTIES (                                    
          "hive.sql.database.type" = "POSTGRES",
          "hive.sql.jdbc.driver" = "org.postgresql.Driver",
          "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/qtestDB",
          "hive.sql.dbcp.username" = "qtestuser",
          "hive.sql.dbcp.password" = "qtestpassword",
          "hive.sql.table" = "country",
          "hive.sql.partitionColumn" = "name",
          "hive.sql.numPartitions" = "2"
      );
      

      The query below fails with IndexOutOfBoundsException when the mapper scanning the JDBC table tries to generate the splits by exploiting the partitioning column.

      select country.id from country cross join city;
      

      The full stack trace is given below.

      java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
              at java.util.ArrayList.rangeCheck(ArrayList.java:659) ~[?:1.8.0_261]
              at java.util.ArrayList.get(ArrayList.java:435) ~[?:1.8.0_261]
              at org.apache.hive.storage.jdbc.JdbcInputFormat.getSplits(JdbcInputFormat.java:102) [hive-jdbc-handler-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:564) [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:858) [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
              at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:263) [hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
              at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:281) [tez-dag-0.10.1.jar:0.10.1]
              at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:272) [tez-dag-0.10.1.jar:0.10.1]
              at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_261]
              at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_261]
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682) [hadoop-common-3.1.0.jar:?]
              at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:272) [tez-dag-0.10.1.jar:0.10.1]
              at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:256) [tez-dag-0.10.1.jar:0.10.1]
              at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) [guava-19.0.jar:?]
              at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) [guava-19.0.jar:?]
              at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) [guava-19.0.jar:?]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_261]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_261]
              at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]
      

      Attachments

        1. jdbc_join_with_partition_table.q
          0.7 kB
          Stamatis Zampetakis
        2. explain_plan.txt
          3 kB
          Stamatis Zampetakis
        3. cbo_plan.txt
          0.4 kB
          Stamatis Zampetakis

        Issue Links

          Activity

            People

              zabetak Stamatis Zampetakis
              zabetak Stamatis Zampetakis
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m