Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16480

ORC file with empty array<double> and array<float> fails to read

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.1, 2.2.0
    • 2.1.2, 2.2.1
    • None

    Description

      We have a schema that has a array<double> in it. We were unable to read this file and digging into ORC it seems that the issue is when the array is empty.

      Here is the stack trace

      EmptyList.log
      ERROR 2017-04-19 09:29:17,075 [main] [EmptyList] [line 56] Failed to work with type float 
      java.io.IOException: Error reading file: /var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-float.orc
        at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1052) ~[hive-orc-2.1.1.jar:2.1.1]
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:135) ~[hive-exec-2.1.1.jar:2.1.1]
        at EmptyList.emptyList(EmptyList.java:49) ~[test-classes/:na]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_121]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_121]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_121]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) [junit-4.12.jar:4.12]
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) [junit-4.12.jar:4.12]
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) [junit-4.12.jar:4.12]
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) [junit-4.12.jar:4.12]
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) [junit-4.12.jar:4.12]
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) [junit-4.12.jar:4.12]
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) [junit-4.12.jar:4.12]
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) [junit-4.12.jar:4.12]
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) [junit-4.12.jar:4.12]
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) [junit-4.12.jar:4.12]
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) [junit-4.12.jar:4.12]
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) [junit-4.12.jar:4.12]
        at org.junit.runners.ParentRunner.run(ParentRunner.java:363) [junit-4.12.jar:4.12]
        at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12]
        at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) [junit-rt.jar:na]
        at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51) [junit-rt.jar:na]
        at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237) [junit-rt.jar:na]
        at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) [junit-rt.jar:na]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_121]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_121]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_121]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
        at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) [idea_rt.jar:na]
      Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
        at org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:118) ~[hive-orc-2.1.1.jar:2.1.1]
        at org.apache.orc.impl.SerializationUtils.readFloat(SerializationUtils.java:78) ~[hive-orc-2.1.1.jar:2.1.1]
        at org.apache.orc.impl.TreeReaderFactory$FloatTreeReader.nextVector(TreeReaderFactory.java:619) ~[hive-orc-2.1.1.jar:2.1.1]
        at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) ~[hive-orc-2.1.1.jar:2.1.1]
        at org.apache.orc.impl.TreeReaderFactory$TreeReader.nextBatch(TreeReaderFactory.java:154) ~[hive-orc-2.1.1.jar:2.1.1]
        at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) ~[hive-orc-2.1.1.jar:2.1.1]
        ... 29 common frames omitted
       INFO 2017-04-19 09:29:17,091 [main] [WriterImpl] [line 205] ORC writer created for path: /var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-double.orc with stripeSize: 67108864 blockSize: 268435456 compression: ZLIB bufferSize: 262144 
       INFO 2017-04-19 09:29:17,100 [main] [ReaderImpl] [line 357] Reading ORC rows from /var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-double.orc with {include: null, offset: 0, length: 9223372036854775807} 
       INFO 2017-04-19 09:29:17,101 [main] [RecordReaderImpl] [line 142] Schema on read not provided -- using file schema array<double> 
      ERROR 2017-04-19 09:29:17,104 [main] [EmptyList] [line 56] Failed to work with type double 
      java.io.IOException: Error reading file: /var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-double.orc
        at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1052) ~[hive-orc-2.1.1.jar:2.1.1]
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:135) ~[hive-exec-2.1.1.jar:2.1.1]
        at EmptyList.emptyList(EmptyList.java:49) ~[test-classes/:na]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_121]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_121]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_121]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) [junit-4.12.jar:4.12]
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) [junit-4.12.jar:4.12]
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) [junit-4.12.jar:4.12]
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) [junit-4.12.jar:4.12]
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) [junit-4.12.jar:4.12]
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) [junit-4.12.jar:4.12]
        at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) [junit-4.12.jar:4.12]
        at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) [junit-4.12.jar:4.12]
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) [junit-4.12.jar:4.12]
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) [junit-4.12.jar:4.12]
        at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) [junit-4.12.jar:4.12]
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) [junit-4.12.jar:4.12]
        at org.junit.runners.ParentRunner.run(ParentRunner.java:363) [junit-4.12.jar:4.12]
        at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12]
        at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) [junit-rt.jar:na]
        at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51) [junit-rt.jar:na]
        at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237) [junit-rt.jar:na]
        at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) [junit-rt.jar:na]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_121]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_121]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_121]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
        at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) [idea_rt.jar:na]
      Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
        at org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:118) ~[hive-orc-2.1.1.jar:2.1.1]
        at org.apache.orc.impl.SerializationUtils.readLongLE(SerializationUtils.java:101) ~[hive-orc-2.1.1.jar:2.1.1]
        at org.apache.orc.impl.SerializationUtils.readDouble(SerializationUtils.java:97) ~[hive-orc-2.1.1.jar:2.1.1]
        at org.apache.orc.impl.TreeReaderFactory$DoubleTreeReader.nextVector(TreeReaderFactory.java:713) ~[hive-orc-2.1.1.jar:2.1.1]
        at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902) ~[hive-orc-2.1.1.jar:2.1.1]
        at org.apache.orc.impl.TreeReaderFactory$TreeReader.nextBatch(TreeReaderFactory.java:154) ~[hive-orc-2.1.1.jar:2.1.1]
        at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) ~[hive-orc-2.1.1.jar:2.1.1]
        ... 29 common frames omitted
      

      If you create a ORC file with one row as the following

      orc.addRow(Lists.newArrayList());
      

      then try to read it

      VectorizedRowBatch batch = reader.getSchema().createRowBatch();
      while (rows.nextBatch(batch)) { }
      

      You will produce the above stack trace.

      Attachments

        1. HIVE-16480.patch
          6 kB
          jcamachorodriguez

        Issue Links

          Activity

            People

              omalley Owen O'Malley
              dcapwell David Capwell
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m