Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-574

Performance: Use const references for string statistics min and max to avoid copy construction

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.2
    • Fix Version/s: 1.5.9, 1.6.3, 1.7.0
    • Component/s: C++
    • Labels:
      None

      Description

      Via Callgrind Performance Profiling of a scenario of a Copy (Full Read and then Full Write) of a 1.9 million row ZLib Compressed ORC Table.  The #4 Usage of CPU is the std::string alloc from being called by: orc::StringColumnStatisticsImpl::update method due to the getMax/getMin calls causing std:string alloc/copy/delete.

       

      Changing the getMaximum/getMinimum methods to return const vals will prevent these alloc/copy/deletes from occurring.

       

      Currently with 1.6.X master the performance profile of this scenario is:

      Instructions Executed: 16.6 Billion Instructions

      real clock time 3.91 seconds

       

      With the fix to use consts, this improves the CPU usage by about 38% and the Clock Time about 10% to:

      Instructions Executed: 12.0 Billion Instructions

      real clock time 3.53 seconds

       

      Attached JPG showing before (left) and after (right) screenshot of callgrind.

       

       

        Attachments

        1. callgrind-before-after.JPG
          67 kB
          David Zanter

          Issue Links

            Activity

              People

              • Assignee:
                dzanter David Zanter
                Reporter:
                dzanter David Zanter
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m