[ORC-1167] Support orc.row.batch.size configuration - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.9.0
Fix Version/s: 1.9.0
Component/s: None
Labels:
None

Description

Now create OrcMapreduceRecordReader, the default value of batch size is 1024, we can support the configuration in Reader.Options.

If we read 1024 relatively large strings, we might get NegativeArraySizeException, but no configuration to reduce batch size.

java.lang.NegativeArraySizeException
	at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1544)
	at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1566)
	at org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1662)
	at org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1508)
	at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:2047)
	at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1219)
	at org.apache.orc.mapreduce.OrcMapreduceRecordReader.ensureBatch(OrcMapreduceRecordReader.java:84)
	at org.apache.orc.mapreduce.OrcMapreduceRecordReader.nextKeyValue(OrcMapreduceRecordReader.java:102)
	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)

Attachments

Issue Links

links to

GitHub Pull Request #1108

Activity

People

Assignee:: dzcxzl

Reporter:: dzcxzl

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/May/22 12:03

Updated:: 23/May/22 04:18

Resolved:: 23/May/22 04:18