Description
We can add a UT to test the scenario after the ORC-1205 release.
bin/spark-shell
spark.sql("set orc.stripe.size=10240") spark.sql("set orc.rows.between.memory.checks=1") spark.sql("set spark.sql.orc.columnarWriterBatchSize=1") val df = spark.range(1, 1+512, 1, 1).map { i => if( i == 1 ){ (i, Array.fill[Byte](5 * 1024 * 1024)('X')) } else { (i,Array.fill[Byte](1)('X')) } }.toDF("c1","c2") df.write.format("orc").save("file:///tmp/test_table_orc_t1") spark.sql("create external table test_table_orc_t1 (c1 string ,c2 binary) location 'file:///tmp/test_table_orc_t1' stored as orc ") spark.sql("select * from test_table_orc_t1").show()
Querying this table will get the following exception
java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.orc.impl.TreeReaderFactory$TreeReader.nextVector(TreeReaderFactory.java:387) at org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:740) at org.apache.orc.impl.ConvertTreeReaderFactory$StringGroupFromAnyIntegerTreeReader.nextVector(ConvertTreeReaderFactory.java:1069) at org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65) at org.apache.orc.impl.reader.tree.StructBatchReader.nextBatchForLevel(StructBatchReader.java:100) at org.apache.orc.impl.reader.tree.StructBatchReader.nextBatch(StructBatchReader.java:77) at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1371) at org.apache.orc.mapreduce.OrcMapreduceRecordReader.ensureBatch(OrcMapreduceRecordReader.java:84) at org.apache.orc.mapreduce.OrcMapreduceRecordReader.nextKeyValue(OrcMapreduceRecordReader.java:102) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
Attachments
Issue Links
- is fixed by
-
ORC-1205 Size of batches in some ConvertTreeReaders should be ensured before using
- Closed
- links to