Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14004

Minor compaction produces ArrayIndexOutOfBoundsException: 7 in SchemaEvolution.getFileType

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.3.0, 2.1.0
    • 1.3.0, 2.1.1, 2.2.0
    • Transactions
    • None

    Description

      Easiest way to repro is to add TestTxnCommands2

        @Test
        public void testCompactWithDelete() throws Exception {
          int[][] tableData = {{1,2},{3,4}};
          runStatementOnDriver("insert into " + Table.ACIDTBL + "(a,b) " + makeValuesClause(tableData));
          runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MAJOR'");
          Worker t = new Worker();
          t.setThreadId((int) t.getId());
          t.setHiveConf(hiveConf);
          AtomicBoolean stop = new AtomicBoolean();
          AtomicBoolean looped = new AtomicBoolean();
          stop.set(true);
          t.init(stop, looped);
          t.run();
          runStatementOnDriver("delete from " + Table.ACIDTBL + " where b = 4");
          runStatementOnDriver("update " + Table.ACIDTBL + " set b = -2 where b = 2");
          runStatementOnDriver("alter table "+ Table.ACIDTBL + " compact 'MINOR'");
          t.run();
        }
      

      to TestTxnCommands2 and run it.
      Test won't fail but if you look
      in target/tmp/log/hive.log for the following exception (from Minor compaction).

      2016-06-09T18:36:39,071 WARN  [Thread-190[]]: mapred.LocalJobRunner (LocalJobRunner.java:run(560)) - job_local1233973168_0005
      java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 7
              at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
              at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) [hadoop-mapreduce-client-common-2.6.1.jar:?]
      Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
              at org.apache.orc.impl.SchemaEvolution.getFileType(SchemaEvolution.java:67) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
              at org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2031) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
              at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
              at org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
              at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.<init>(TreeReaderFactory.java:1716) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
              at org.apache.orc.impl.TreeReaderFactory.createTreeReader(TreeReaderFactory.java:2077) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
              at org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:208) ~[hive-orc-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
              at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:63) ~[classes/:?]
              at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:365) ~[classes/:?]
              at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger$ReaderPair.<init>(OrcRawRecordMerger.java:207) ~[classes/:?]
              at org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:508) ~[classes/:?]
              at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRawReader(OrcInputFormat.java:1977) ~[classes/:?]
              at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:630) ~[classes/:?]
              at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorMap.map(CompactorMR.java:609) ~[classes/:?]
              at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
              at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) ~[hadoop-mapreduce-client-core-2.6.1.jar:?]
              at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) ~[hadoop-mapreduce-client-common-2.6.1.jar:?]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[?:1.7.0_71]
              at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_71]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[?:1.7.0_71]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[?:1.7.0_71]
              at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_71]
      

      I observed the same on a real cluster.

      Based on my observations, running Major compaction instead of minor, works fine.
      Replacing the DELETE operation with update, makes both Major/Minor run fine.

      The issue itself should be addressed by HIVE-13974 but need to make sure to add the test.

      Attachments

        1. HIVE-14004.patch
          4 kB
          Owen O'Malley
        2. HIVE-14004.03.patch
          50 kB
          Matt McCline
        3. HIVE-14004.02.patch
          51 kB
          Matt McCline
        4. HIVE-14004.01.patch
          54 kB
          Matt McCline

        Issue Links

          Activity

            People

              ekoifman Eugene Koifman
              ekoifman Eugene Koifman
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: