Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-1167

Support orc.row.batch.size configuration

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.9.0
    • 1.9.0
    • None
    • None

    Description

      Now create OrcMapreduceRecordReader, the default value of batch size is 1024, we can support the configuration in Reader.Options.

       

      If we read 1024 relatively large strings, we might get NegativeArraySizeException, but no configuration to reduce batch size.

       

      java.lang.NegativeArraySizeException
      	at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1544)
      	at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1566)
      	at org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1662)
      	at org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1508)
      	at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047)
      	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219)
      	at org.apache.orc.mapreduce.OrcMapreduceRecordReader.ensureBatch(OrcMapreduceRecordReader.java:84)
      	at org.apache.orc.mapreduce.OrcMapreduceRecordReader.nextKeyValue(OrcMapreduceRecordReader.java:102)
      	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) 

       

      Attachments

        Issue Links

          Activity

            People

              dzcxzl dzcxzl
              dzcxzl dzcxzl
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: