Uploaded image for project: 'Apache Avro'
  1. Apache Avro
  2. AVRO-3156

Performance degradation in SpecificRecordBuilder introduced in 1.9.0

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.9.0, 1.10.0
    • None
    • java
    • Using SpecificData in environments with multiple classloaders.

    Description

      The change introduced in spark 1.9.0 which changed:

       SpecificData.get()

      into:

      SpecificData.getForSchema(schema)

      introduced a significant performance degradation in environments where the class of schema is provided by a different classloader then the classloader containing SpecificData.

      A possible solution is to use the classCache of the default SpecificData so the sometimes expensive classloader codepath is cached. (PR coming up)

      We noticed this in after trying out a spark upstep from spark 3.1.0 (avro 1.8.2) to 3.2.0 (spark 1.10.2) where 74% of the time was spend in millions of times resolving the same class.

      With this patch this resolving time was brought back from 74% to 0.70%.

      JMC flamegraph showing this issue:

       

      Attachments

        1. image-2021-06-11-14-27-16-689.png
          244 kB
          Steven Aerts

        Issue Links

          Activity

            People

              Unassigned Unassigned
              steven.aerts Steven Aerts
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m