Uploaded image for project: 'Avro'
  1. Avro
  2. AVRO-2048

Avro Binary Decoding - Gracefully Handle Long Strings

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.7.7, 1.8.2
    • Fix Version/s: None
    • Component/s: java
    • Labels:
      None
    • Flags:
      Patch

      Description

      According to the specs:

      a string is encoded as a long followed by that many bytes of UTF-8 encoded character data.

      However, that is currently not being adhered to:

      org.apache.avro.io.BinaryDecoder
        @Override
        public Utf8 readString(Utf8 old) throws IOException {
          int length = readInt();
          Utf8 result = (old != null ? old : new Utf8());
          result.setByteLength(length);
          if (0 != length) {
            doReadBytes(result.getBytes(), 0, length);
          }
          return result;
        }
      

      The first thing the code does here is to load an int value, not a long. Because of the variable length nature of the size, this will mostly work. However, there may be edge-cases where the serializer is putting in large length values erroneously or nefariously. Let us gracefully detect such scenarios and more closely adhere to the spec.

      1. AVRO-2048.1.patch
        3 kB
        BELUGA BEHR
      2. AVRO-2048.2.patch
        3 kB
        BELUGA BEHR

        Activity

        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 13s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 buildtest 0m 0s master passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        -1 buildtest 1m 36s java in the patch failed.
        1m 52s



        Subsystem Report/Notes
        Docker Client=1.13.1 Server=1.13.1 Image:yetus/avro:793178a
        JIRA Issue AVRO-2048
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877398/AVRO-2048.1.patch
        Optional Tests buildtest javac
        uname Linux 8139c24a523e 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 GNU/Linux
        Build tool build
        git revision master / 793178a
        Default Java 1.7.0_111
        buildtest https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/32/artifact/patchprocess/test--lang_java.txt
        modules C: lang/java U: lang/java
        Console output https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/32/console
        Powered by Apache Yetus 0.4.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 buildtest 0m 0s master passed +1 whitespace 0m 0s The patch has no whitespace issues. -1 buildtest 1m 36s java in the patch failed. 1m 52s Subsystem Report/Notes Docker Client=1.13.1 Server=1.13.1 Image:yetus/avro:793178a JIRA Issue AVRO-2048 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877398/AVRO-2048.1.patch Optional Tests buildtest javac uname Linux 8139c24a523e 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 GNU/Linux Build tool build git revision master / 793178a Default Java 1.7.0_111 buildtest https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/32/artifact/patchprocess/test--lang_java.txt modules C: lang/java U: lang/java Console output https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/32/console Powered by Apache Yetus 0.4.0 http://yetus.apache.org This message was automatically generated.
        Hide
        sacharya Suraj Acharya added a comment -

        Seems like the code is not passing checkstyle.
        The artifacts are present here: https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/32/artifact/component/patchprocess/test--lang_java.txt

        The value is 0l. Change it to 0L and it should pass.

        
        [INFO] 
        [INFO] --- maven-checkstyle-plugin:2.17:check (checkstyle-check) @ avro ---
        [INFO] Starting audit...
        /testptch/avro/lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java:263:19: error: Should use uppercase 'L'.
        /testptch/avro/lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java:268:10: error: Should use uppercase 'L'.
        Audit done.
        [INFO] There are 2 errors reported by Checkstyle 6.11.2 with checkstyle.xml ruleset.
        [ERROR] src/main/java/org/apache/avro/io/BinaryDecoder.java:[263,19] (misc) UpperEll: Should use uppercase 'L'.
        [ERROR] src/main/java/org/apache/avro/io/BinaryDecoder.java:[268,10] (misc) UpperEll: Should use uppercase 'L'.
        
        
        Show
        sacharya Suraj Acharya added a comment - Seems like the code is not passing checkstyle. The artifacts are present here: https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/32/artifact/component/patchprocess/test--lang_java.txt The value is 0l . Change it to 0L and it should pass. [INFO] [INFO] --- maven-checkstyle-plugin:2.17:check (checkstyle-check) @ avro --- [INFO] Starting audit... /testptch/avro/lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java:263:19: error: Should use uppercase 'L'. /testptch/avro/lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java:268:10: error: Should use uppercase 'L'. Audit done. [INFO] There are 2 errors reported by Checkstyle 6.11.2 with checkstyle.xml ruleset. [ERROR] src/main/java/org/apache/avro/io/BinaryDecoder.java:[263,19] (misc) UpperEll: Should use uppercase 'L'. [ERROR] src/main/java/org/apache/avro/io/BinaryDecoder.java:[268,10] (misc) UpperEll: Should use uppercase 'L'.
        Hide
        belugabehr BELUGA BEHR added a comment -

        Fixed checkstyle failures

        Show
        belugabehr BELUGA BEHR added a comment - Fixed checkstyle failures
        Hide
        belugabehr BELUGA BEHR added a comment -

        I can't explain why this is, but it seems to be a tad faster with this patch:

        # Avro Master
        StringRead:   3291 ms      12.152       432.835       1780910
        StringRead:   3290 ms      12.155       432.949       1780910
        StringRead:   3287 ms      12.166       433.320       1780910
        
        # Avro Master + Patch
        StringRead:   3270 ms      12.229       435.574       1780910
        StringRead:   3288 ms      12.163       433.241       1780910
        StringRead:   3271 ms      12.227       435.521       1780910
        
        Show
        belugabehr BELUGA BEHR added a comment - I can't explain why this is, but it seems to be a tad faster with this patch: # Avro Master StringRead: 3291 ms 12.152 432.835 1780910 StringRead: 3290 ms 12.155 432.949 1780910 StringRead: 3287 ms 12.166 433.320 1780910 # Avro Master + Patch StringRead: 3270 ms 12.229 435.574 1780910 StringRead: 3288 ms 12.163 433.241 1780910 StringRead: 3271 ms 12.227 435.521 1780910

          People

          • Assignee:
            belugabehr BELUGA BEHR
            Reporter:
            belugabehr BELUGA BEHR
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development