Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17448

ArrayIndexOutOfBoundsException on ORC tables after adding a struct field

Log workAgile BoardRank to TopRank to BottomAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.1.1
    • Fix Version/s: None
    • Component/s: ORC
    • Labels:
      None
    • Environment:

      Reproduced on Dataproc 1.1, 1.2 (Hive 2.1).

      Description

      When ORC files have been created with older schema, which had smaller set of struct fields, and schema have been changed to one with more struct fields, and there are sibling fields of struct type going after struct itself, ArrayIndexOutOfBoundsException is being thrown. Steps to reproduce:

      create external table test_broken_struct(a struct<f1:int, f2:int>, b int) stored as orc;
      insert into table test_broken_struct 
          select named_struct("f1", 1, "f2", 2), 3;
      drop table test_broken_struct;
      create external table test_broken_struct(a struct<f1:int, f2:int, f3:int>, b int) stored as orc;
      select * from test_broken_struct;
      

      Same scenario is not causing crash on hive 0.14.

      Debug log and stack trace:

      2017-09-07T00:21:40,266  INFO [main] orc.OrcInputFormat: Using schema evolution configuration variables schema.evol
      ution.columns [a, b] / schema.evolution.columns.types [struct<f1:int,f2:int,f3:int>, int] (isAcidRead false)
      2017-09-07T00:21:40,267 DEBUG [main] orc.OrcInputFormat: No ORC pushdown predicate
      2017-09-07T00:21:40,267  INFO [main] orc.ReaderImpl: Reading ORC rows from hdfs://cluster-7199-m/user/hive/warehous
      e/test_broken_struct/000000_0 with {include: [true, true, true, true, true], offset: 3, length: 159, schema: struct
      <a:struct<f1:int,f2:int,f3:int>,b:int>}
      Failed with exception java.io.IOException:java.lang.ArrayIndexOutOfBoundsException: 5
      2017-09-07T00:21:40,273 ERROR [main] CliDriver: Failed with exception java.io.IOException:java.lang.ArrayIndexOutOf
      BoundsException: 5
      java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 5
              at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
              at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
              at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
              at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2098)
              at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252)
              at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
              at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
              at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
              at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
              at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
              at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
      Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
              at org.apache.orc.impl.SchemaEvolution.buildConversionFileTypesArray(SchemaEvolution.java:195)
              at org.apache.orc.impl.SchemaEvolution.buildConversionFileTypesArray(SchemaEvolution.java:253)
              at org.apache.orc.impl.SchemaEvolution.<init>(SchemaEvolution.java:59)
              at org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:149)
              at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:63)
              at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:87)
              at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:314)
              at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.<init>(OrcInputFormat.java:225)
              at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1691)
              at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:69
      5)
              at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
              at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
              ... 15 more
      

        Attachments

        1. HIVE-17448.1-branch-2.1.patch
          3 kB
          Nikolay Sokolov

        Issue Links

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

            • Assignee:
              aniket486 Aniket Namadeo Mokashi Assign to me
              Reporter:
              chemikadze Nikolay Sokolov

              Dates

              • Created:
                Updated:

                Issue deployment