Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.9.0
-
None
-
None
Description
ORC-961 introduced a metrics collector for the reader. However, it may affect the performance of reading ORC files. It may be helpful to disable it as default.
Reproducable experiment result:
Alibaba Cloud ecs.s6-c1m4.xlarge, running Ubuntu 20.04, ESSD PL1 40GB
The original file is 4.1GB csv file with generated string with some degree of repetiveness (the value of one column follows a zipfian distribution). The ORC file with dictionary encoding and no block compression is 319MB.
Time of running orc-scan with metrics enabled: 7.5s
Time of running orc-scan with metrics disabled: 1.5s
The action of disable is implemented by adding
readerOpts.setReaderMetrics(nullptr);
Attachments
Issue Links
- links to