Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2202

Redundant String allocation on the hot path in CapacityByteArrayOutputStream.setByte

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.12.3
    • 1.13.0
    • parquet-mr

    Description

      Profiling of a Spark application revealed a performance issue in production:

      CapacityByteArrayOutputStream.setByte consumed 2.2% of total CPU time and made up 4.6% of total allocations. However, in normal case, this method should allocate nothing at all.

      Here is an excerpt from async-profiler report.

      CPU profile:

      Allocation profile:

      The reason is a checkArgument() call with an unconditionally constructed dynamic String:

      https://github.com/apache/parquet-mr/blob/62b774cd0f0c60cfbe540bbfa60bee15929af5d4/parquet-common/src/main/java/org/apache/parquet/bytes/CapacityByteArrayOutputStream.java#L303

      The suggested fix is to move String construction under the condition:

      if (index >= bytesUsed) {
        throw new IllegalArgumentException("Index: " + index +
            " is >= the current size of: " + bytesUsed);
      }

      Attachments

        1. profile-cpu.png
          147 kB
          Andrei Pangin
        2. profile-alloc.png
          152 kB
          Andrei Pangin

        Activity

          People

            Unassigned Unassigned
            apangin Andrei Pangin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: