Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Issue:
      Avro objects are not serialized by Kryo causing the Avro GenericRecord to not be available to downstream operators if users don't explicitly mark the stream locality at container_local or thread_local.

      Solution:
      This JIRA is used to create a Module on top of AvroFileInputOperator and AvroToPojo operators such that downstream operators will access POJO instead of Avro GenericRecord. It, therefore, removes the exposure of GenericRecord to downstream operators and instead exposes the created POJO to downstream operators.

      In this Module, the stream between the two encapsulated operators (AvroFileInputOperator and AvroToPojo) is set to CONTAINER_LOCAL.

      Along with this new module, existing avro support files are moved from contrib module to a new 'avro' module.
      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Unit Test
      Unit test for this Avro module has been added in malhar-avro package.

      Move to new package and Backward compatibility
      Additionally, this module is part of a new package 'malhar-avro' and the operator files/tests are all moved from contrib package to the new package.
      Old operator files are marked deprecated and made to extend from new operator files for backward compatibility.
      Creating a new maven module for Avro is in accordance with the JIRA "https://issues.apache.org/jira/browse/APEXMALHAR-1843."
      Git history of all the moved files is maintained

      Application Level Testing

      • To test the module, I created a sample StreamingApplication and a POJO class. This application adds the new AvroToPojoModule, and ConsoleOperator to the DAG. ConsoleOperator received and displayed POJO from the module

      To test backward compatibility, I created sample application which adds AvroFileInputOperator and AvroToPojo from the old package to the DAG. It also adds ConsoleOperator to the DAG. ConsoleOperator received and displayed POJO from the module

        Issue Links

          Activity

          Hide
          thw Thomas Weise added a comment -

          I would recommend to review the design of this overall and make sure the Avro spec is properly supported (including logical types from 1.8). Also, Avro is a serialization framework that by itself supports multiple wire representations. I'm not sure why GenericRecord would be exposed to a downstream operator. It seems that an abstract class should provide the GenericRecord and leave it to a derived class to provide a result.

          Show
          thw Thomas Weise added a comment - I would recommend to review the design of this overall and make sure the Avro spec is properly supported (including logical types from 1.8). Also, Avro is a serialization framework that by itself supports multiple wire representations. I'm not sure why GenericRecord would be exposed to a downstream operator. It seems that an abstract class should provide the GenericRecord and leave it to a derived class to provide a result.
          Hide
          SaumyaMohan Saumya Mohan added a comment -

          Hi Thomas, this JIRA is just the first step toward improving the Avro input functionality. As part of this JIRA we're creating a module to encapsulate Avro Container File -> Generic Record -> POJO transformation which user can use directly. Further enhancements will take place as part of separate JIRAs.

          Show
          SaumyaMohan Saumya Mohan added a comment - Hi Thomas, this JIRA is just the first step toward improving the Avro input functionality. As part of this JIRA we're creating a module to encapsulate Avro Container File -> Generic Record -> POJO transformation which user can use directly. Further enhancements will take place as part of separate JIRAs.
          Hide
          githubbot ASF GitHub Bot added a comment -

          SaumyaMohan opened a new pull request #663: APEXMALHAR-2034 Create Avro Module to encapsulate Container File to Generic Record to POJO conversion
          URL: https://github.com/apache/apex-malhar/pull/663

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - SaumyaMohan opened a new pull request #663: APEXMALHAR-2034 Create Avro Module to encapsulate Container File to Generic Record to POJO conversion URL: https://github.com/apache/apex-malhar/pull/663 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          SaumyaMohan closed pull request #663: APEXMALHAR-2034 Create Avro Module to encapsulate Container File to Generic Record to POJO conversion
          URL: https://github.com/apache/apex-malhar/pull/663

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - SaumyaMohan closed pull request #663: APEXMALHAR-2034 Create Avro Module to encapsulate Container File to Generic Record to POJO conversion URL: https://github.com/apache/apex-malhar/pull/663 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          SaumyaMohan opened a new pull request #665: APEXMALHAR-2034 Adding new Avro Module to encapsulate Container File to GenericRecord to POJO transformation
          URL: https://github.com/apache/apex-malhar/pull/665

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - SaumyaMohan opened a new pull request #665: APEXMALHAR-2034 Adding new Avro Module to encapsulate Container File to GenericRecord to POJO transformation URL: https://github.com/apache/apex-malhar/pull/665 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          SaumyaMohan opened a new pull request #666: APEXMALHAR-2034 Adding new Avro Module to encapsulate Container File to Avro GenericRecord to POJO transformations
          URL: https://github.com/apache/apex-malhar/pull/666

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - SaumyaMohan opened a new pull request #666: APEXMALHAR-2034 Adding new Avro Module to encapsulate Container File to Avro GenericRecord to POJO transformations URL: https://github.com/apache/apex-malhar/pull/666 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          SaumyaMohan closed pull request #666: APEXMALHAR-2034 Adding new Avro Module to encapsulate Container File to Avro GenericRecord to POJO transformations
          URL: https://github.com/apache/apex-malhar/pull/666

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - SaumyaMohan closed pull request #666: APEXMALHAR-2034 Adding new Avro Module to encapsulate Container File to Avro GenericRecord to POJO transformations URL: https://github.com/apache/apex-malhar/pull/666 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org
          Hide
          githubbot ASF GitHub Bot added a comment -

          SaumyaMohan opened a new pull request #670: APEXMALHAR-2034 Adding new Avro Module to encapsulate Container File to Avro GenericRecord to POJO transformations
          URL: https://github.com/apache/apex-malhar/pull/670

          ----------------------------------------------------------------
          This is an automated message from the Apache Git Service.
          To respond to the message, please log on GitHub and use the
          URL above to go to the specific comment.

          For queries about this service, please contact Infrastructure at:
          users@infra.apache.org

          Show
          githubbot ASF GitHub Bot added a comment - SaumyaMohan opened a new pull request #670: APEXMALHAR-2034 Adding new Avro Module to encapsulate Container File to Avro GenericRecord to POJO transformations URL: https://github.com/apache/apex-malhar/pull/670 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org

            People

            • Assignee:
              SaumyaMohan Saumya Mohan
              Reporter:
              devendra.tagare devendra tagare
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Development