Uploaded image for project: 'Avro'
  1. Avro
  2. AVRO-1241

improve trevni performance on string deserialization

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.3
    • Fix Version/s: 1.7.4
    • Component/s: java
    • Labels:
      None
    • Release Note:
      Optimized string de-serialization

      Description

      I have been trying to implement a storage function for Apache Pig that writes data in Trevni format. I found that the storage function was very slow when reading whole records.

      I did some profiling (with Yourkit) and found that most of the CPU time was being spent in org.apache.trevni.InputBuffer$readString() (specifically in the String() method). I changed to java.nio.charset.CharsetDecoder.decode for deserialization and saw a big improvement. Changes are included in the patch.

        Attachments

        1. AVRO-1241
          2 kB
          Joseph Adler

          Issue Links

            Activity

              People

              • Assignee:
                jadler Joseph Adler
                Reporter:
                jadler Joseph Adler
              • Votes:
                1 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: