Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-1232

[C++]Disable metrics collector by default

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.9.0
    • 1.9.0
    • None
    • None

    Description

      ORC-961 introduced a metrics collector for the reader. However, it may affect the performance of reading ORC files. It may be helpful to disable it as default.

       

      Reproducable experiment result:

      Alibaba Cloud ecs.s6-c1m4.xlarge, running Ubuntu 20.04, ESSD PL1 40GB

      The original file is 4.1GB csv file with generated string with some degree of repetiveness (the value of one column follows a zipfian distribution). The ORC file with dictionary encoding and no block compression is 319MB.

       

      Time of running orc-scan with metrics enabled: 7.5s

      Time of running orc-scan with metrics disabled: 1.5s

      The action of disable is implemented by adding 

      readerOpts.setReaderMetrics(nullptr); 

      after https://github.com/apache/orc/blob/02e48107b36b8ed868797dadcd7355a632519d48/tools/src/FileScan.cc#L26

      Attachments

        Issue Links

          Activity

            People

              rex_xinzh ZhangXin
              xzeng Xinyu Zeng
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: