Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5845

[Java] Implement converter between Arrow record batches and Avro records

Add voteWatch issue
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Java
    • None

    Description

      It would be useful for applications which need convert Avro data to Arrow data.

      This is an adapter which convert data with existing API (like JDBC adapter) rather than a native reader (like orc).

      We implement this function through Avro java project, receiving param like Decoder/Schema/DatumReader of Avro and return VectorSchemaRoot. For each data type we have a consumer class as below to get Avro data and write it into vector to avoid boxing/unboxing (e.g. GenericRecord#get returns Object)

      public class AvroIntConsumer implements Consumer {
      private final IntWriter writer;
      
      public AvroIntConsumer(IntVector vector)
      
      { this.writer = new IntWriterImpl(vector); }
      
      @Override
      public void consume(Decoder decoder) throws IOException
      
      { writer.writeInt(decoder.readInt()); writer.setPosition(writer.getPosition() + 1); }
      

      We intended to support primitive and complex types (null value represented via unions type with null type), size limit and field selection could be optional for users. 

      Attachments

        1.
        [Java] Create Avro adapter module and add dependencies Sub-task Resolved Ji Liu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h 20m
        Actions
        2.
        [Java] Initial implement to convert Avro record with primitive types Sub-task Resolved Ji Liu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 10h 10m
        Actions
        3.
        [Java] Avro adapter implement simple Record type Sub-task Resolved Ji Liu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 3h 50m
        Actions
        4.
        [Java] Avro adapter support convert nullable value Sub-task Resolved Ji Liu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 4h 40m
        Actions
        5.
        [Java] Avro adapter implement unions type Sub-task Resolved Ji Liu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 5h 50m
        Actions
        6.
        [Java] Avro adapter avoid potential resource leak. Sub-task Resolved Ji Liu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 2.5h
        Actions
        7.
        [Java] Add API to avro adapter to limit number of rows returned at a time. Sub-task Resolved Ji Liu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 4.5h
        Actions
        8.
        [Java] Avro adapter implement Array/Map/Fixed type Sub-task Resolved Ji Liu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 12h 20m
        Actions
        9.
        [Java] Avro adapter implement Enum type and nested Record type Sub-task Resolved Ji Liu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 2h
        Actions
        10.
        [Java] Add benchmark and large fake data UT for avro adapter Sub-task Resolved Ji Liu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1.5h
        Actions
        11.
        [Java] Add support for skipping decoding of columns/field in Avro converter Sub-task Resolved Ji Liu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 8h
        Actions
        12.
        [Java] Experiment with performance difference of avoiding the use of Avro Decoder Sub-task Open Unassigned   Actions
        13.
        [Java] Support logical type encodings from Avro Sub-task Resolved Ji Liu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 2h
        Actions
        14.
        [Java] Avro - Experiment with "compiled" consumer delegates for performance. Sub-task Open Unassigned   Actions
        15.
        [JAVA] Avro adapter benchmark only runs once in JMH Sub-task Resolved Ji Liu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h 20m
        Actions
        16.
        [Java] Extract a common base class for avro converter consumers Sub-task Resolved Ji Liu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h 40m
        Actions
        17.
        [Java] Avro converter should convert attributes and props to FieldType metadata Sub-task Resolved Ji Liu

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 40m
        Actions

        Activity

          People

            Unassigned Unassigned
            tianchen92 Ji Liu

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 62h 20m
                62h 20m

                Slack

                  Issue deployment