Uploaded image for project: 'Avro'
  1. Avro
  2. AVRO-2049

Remove Superfluous Configuration From AvroSerializer

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 1.7.7, 1.8.2
    • Fix Version/s: 1.9.0
    • Component/s: java
    • Labels:
      None
    • Flags:
      Patch

      Description

      In the class org.apache.avro.hadoop.io.AvroSerializer, we see that the Avro block size is configured with a hard-coded value and there is a request to benchmark different buffer sizes.

      org.apache.avro.hadoop.io.AvroSerializer
        /**
         * The block size for the Avro encoder.
         *
         * This number was copied from the AvroSerialization of org.apache.avro.mapred in Avro 1.5.1.
         *
         * TODO(gwu): Do some benchmarking with different numbers here to see if it is important.
         */
        private static final int AVRO_ENCODER_BLOCK_SIZE_BYTES = 512;
      
        /** An factory for creating Avro datum encoders. */
        private static EncoderFactory mEncoderFactory
            = new EncoderFactory().configureBlockSize(AVRO_ENCODER_BLOCK_SIZE_BYTES);
      

      However, there is no need to benchmark, this setting is superfluous and is ignored with the current implementation.

      org.apache.avro.hadoop.io.AvroSerializer
        @Override
        public void open(OutputStream outputStream) throws IOException {
          mOutputStream = outputStream;
          mAvroEncoder = mEncoderFactory.binaryEncoder(outputStream, mAvroEncoder);
        }
      

      org.apache.avro.io.EncoderFactory.binaryEncoder ignores this setting. This setting is only relevant for calls to org.apache.avro.io.EncoderFactory.blockingBinaryEncoder
      which considers the configured "Block Size" for doing binary encoding of blocked Array types as laid out in the specs. It can simply be removed.

      1. AVRO-2049.3.patch
        2 kB
        BELUGA BEHR
      2. AVRO-2049.2.patch
        3 kB
        BELUGA BEHR
      3. AVRO-2049.1.patch
        1 kB
        BELUGA BEHR

        Activity

        Hide
        belugabehr BELUGA BEHR added a comment - - edited

        Updated patch to include a couple of other instances of the same superfluous execution

        Show
        belugabehr BELUGA BEHR added a comment - - edited Updated patch to include a couple of other instances of the same superfluous execution
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 4m 45s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 buildtest 0m 0s master passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 buildtest 6m 35s the patch passed
        11m 24s



        Subsystem Report/Notes
        Docker Client=1.13.1 Server=1.13.1 Image:yetus/avro:793178a
        JIRA Issue AVRO-2049
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877504/AVRO-2049.2.patch
        Optional Tests buildtest javac
        uname Linux de48a9725531 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 GNU/Linux
        Build tool build
        git revision master / 793178a
        Default Java 1.7.0_111
        modules C: lang/java U: lang/java
        Console output https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/25/console
        Powered by Apache Yetus 0.4.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 4m 45s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 buildtest 0m 0s master passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 buildtest 6m 35s the patch passed 11m 24s Subsystem Report/Notes Docker Client=1.13.1 Server=1.13.1 Image:yetus/avro:793178a JIRA Issue AVRO-2049 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877504/AVRO-2049.2.patch Optional Tests buildtest javac uname Linux de48a9725531 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 GNU/Linux Build tool build git revision master / 793178a Default Java 1.7.0_111 modules C: lang/java U: lang/java Console output https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/25/console Powered by Apache Yetus 0.4.0 http://yetus.apache.org This message was automatically generated.
        Hide
        hadoopqa Hadoop QA added a comment -
        +1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 12s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 buildtest 0m 0s master passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 buildtest 6m 46s the patch passed
        7m 1s



        Subsystem Report/Notes
        Docker Client=1.13.1 Server=1.13.1 Image:yetus/avro:793178a
        JIRA Issue AVRO-2049
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877504/AVRO-2049.2.patch
        Optional Tests buildtest javac
        uname Linux 89a023614ebe 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 GNU/Linux
        Build tool build
        git revision master / 793178a
        Default Java 1.7.0_111
        modules C: lang/java U: lang/java
        Console output https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/26/console
        Powered by Apache Yetus 0.4.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 buildtest 0m 0s master passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 buildtest 6m 46s the patch passed 7m 1s Subsystem Report/Notes Docker Client=1.13.1 Server=1.13.1 Image:yetus/avro:793178a JIRA Issue AVRO-2049 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12877504/AVRO-2049.2.patch Optional Tests buildtest javac uname Linux 89a023614ebe 3.13.0-117-generic #164-Ubuntu SMP Fri Apr 7 11:05:26 UTC 2017 x86_64 GNU/Linux Build tool build git revision master / 793178a Default Java 1.7.0_111 modules C: lang/java U: lang/java Console output https://builds.apache.org/job/PreCommit-AVRO-Build-TEST/26/console Powered by Apache Yetus 0.4.0 http://yetus.apache.org This message was automatically generated.
        Hide
        nkollar Nandor Kollar added a comment -

        Looks good to me, just one minor comment, in AvroSerialization I think we can reuse the encoder by passing it to the factory method instead of null: this.encoder = new EncoderFactory().binaryEncoder(out, encoder);

        Show
        nkollar Nandor Kollar added a comment - Looks good to me, just one minor comment, in AvroSerialization I think we can reuse the encoder by passing it to the factory method instead of null: this.encoder = new EncoderFactory().binaryEncoder(out, encoder);
        Hide
        gszadovszky Gabor Szadovszky added a comment -

        BELUGA BEHR, as far as I can see ResolvingGramarGenerator sets the bufferSize (not the blockSize) which is used when a new binaryEncoder is created. With your change the bufferSize of the binaryEncoder will be 2048 instead of 32 in ResolvingGrammarGenerator.getBinary.

        Show
        gszadovszky Gabor Szadovszky added a comment - BELUGA BEHR , as far as I can see ResolvingGramarGenerator sets the bufferSize (not the blockSize ) which is used when a new binaryEncoder is created. With your change the bufferSize of the binaryEncoder will be 2048 instead of 32 in ResolvingGrammarGenerator.getBinary .
        Hide
        belugabehr BELUGA BEHR added a comment -

        Gabor Szadovszky Good catch. Thanks. Perhaps we should create a new ticket to remove the magic number.

        Show
        belugabehr BELUGA BEHR added a comment - Gabor Szadovszky Good catch. Thanks. Perhaps we should create a new ticket to remove the magic number.
        Hide
        belugabehr BELUGA BEHR added a comment -

        Nandor Kollar Maybe so, we can reuse encoder. I'm not sure how often "open" is actually called though. New ticket?

        Show
        belugabehr BELUGA BEHR added a comment - Nandor Kollar Maybe so, we can reuse encoder. I'm not sure how often "open" is actually called though. New ticket?
        Hide
        gszadovszky Gabor Szadovszky added a comment -

        +1 will commit tomorrow if no more comments

        Nandor Kollar, I did some research related to the usage of Serializer.open(OutputStream) and it seems they are not really reused. We might set the previous encoder to EncoderFactory.binaryEncoder(OutputStream, BinaryEncoder) to reuse, though. However, it would recommend having another JIRA for that.

        Show
        gszadovszky Gabor Szadovszky added a comment - +1 will commit tomorrow if no more comments Nandor Kollar , I did some research related to the usage of Serializer.open(OutputStream) and it seems they are not really reused. We might set the previous encoder to EncoderFactory.binaryEncoder(OutputStream, BinaryEncoder) to reuse, though. However, it would recommend having another JIRA for that.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 91d4cc03bbe84ca2aa0ed7f9094a20061b04a8a3 in avro's branch refs/heads/master from BELUGA BEHR
        [ https://git-wip-us.apache.org/repos/asf?p=avro.git;h=91d4cc0 ]

        AVRO-2049: Remove Superfluous Configuration From AvroSerializer

        Show
        jira-bot ASF subversion and git services added a comment - Commit 91d4cc03bbe84ca2aa0ed7f9094a20061b04a8a3 in avro's branch refs/heads/master from BELUGA BEHR [ https://git-wip-us.apache.org/repos/asf?p=avro.git;h=91d4cc0 ] AVRO-2049 : Remove Superfluous Configuration From AvroSerializer

          People

          • Assignee:
            belugabehr BELUGA BEHR
            Reporter:
            belugabehr BELUGA BEHR
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development