Avro
  1. Avro
  2. AVRO-1241

improve trevni performance on string deserialization

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.3
    • Fix Version/s: 1.7.4
    • Component/s: java
    • Labels:
      None
    • Release Note:
      Optimized string de-serialization

      Description

      I have been trying to implement a storage function for Apache Pig that writes data in Trevni format. I found that the storage function was very slow when reading whole records.

      I did some profiling (with Yourkit) and found that most of the CPU time was being spent in org.apache.trevni.InputBuffer$readString() (specifically in the String() method). I changed to java.nio.charset.CharsetDecoder.decode for deserialization and saw a big improvement. Changes are included in the patch.

      1. AVRO-1241
        2 kB
        Joseph Adler

        Issue Links

          Activity

          Hide
          Joseph Adler added a comment -

          Optimization in the attached file

          Show
          Joseph Adler added a comment - Optimization in the attached file
          Hide
          Joseph Adler added a comment -

          I should add that I have only tested this change with Hadoop 1.0.4, and only with Java 6. I'm not sure if this is a problem unique to our environment or a general issue.

          Show
          Joseph Adler added a comment - I should add that I have only tested this change with Hadoop 1.0.4, and only with Java 6. I'm not sure if this is a problem unique to our environment or a general issue.
          Hide
          Doug Cutting added a comment -

          This seems like a fine change. I'll commit it soon unless there are objections.

          Show
          Doug Cutting added a comment - This seems like a fine change. I'll commit it soon unless there are objections.
          Hide
          Joseph Adler added a comment -

          One other note: this change speeds up data loading by 3-4x in our environment.

          Show
          Joseph Adler added a comment - One other note: this change speeds up data loading by 3-4x in our environment.
          Hide
          Doug Cutting added a comment -

          I committed this.

          (I renamed the variable 'utf8Decoder' to be just 'utf8' so the code still fit in 80 columns.)

          Thanks, Joseph!

          Show
          Doug Cutting added a comment - I committed this. (I renamed the variable 'utf8Decoder' to be just 'utf8' so the code still fit in 80 columns.) Thanks, Joseph!
          Hide
          Hudson added a comment -

          Integrated in AvroJava #342 (See https://builds.apache.org/job/AvroJava/342/)
          AVRO-1241. Java: Optimize Trevni string input. Contributed by Joseph Adler. (Revision 1442391)

          Result = SUCCESS
          cutting :
          Files :

          • /avro/trunk/CHANGES.txt
          • /avro/trunk/lang/java/trevni/core/src/main/java/org/apache/trevni/InputBuffer.java
          Show
          Hudson added a comment - Integrated in AvroJava #342 (See https://builds.apache.org/job/AvroJava/342/ ) AVRO-1241 . Java: Optimize Trevni string input. Contributed by Joseph Adler. (Revision 1442391) Result = SUCCESS cutting : Files : /avro/trunk/CHANGES.txt /avro/trunk/lang/java/trevni/core/src/main/java/org/apache/trevni/InputBuffer.java

            People

            • Assignee:
              Joseph Adler
              Reporter:
              Joseph Adler
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development