Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2292

Improve default SpecificRecord model selection for Avro{Write,Read}Support

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.14.0, 1.13.1
    • parquet-avro
    • None

    Description

      AvroWriteSupport/AvroReadSupport can improve the precision of their default `model` selection. Currently they default to new SpecificDataSupplier().get()[0]. This means that SpecificRecord classes that contain logical types will fail out-of-the-box unless a specific DATA_SUPPLIER is configured that contains logical type conversions.

      I think we can improve this and make logical types work by default by defaulting to the value of the `MODEL$` field that every SpecificRecordBase implementation contains, which already contains all the logical conversions for that Avro type. It would require reflection, but that's what the Avro library is already doing to fetch models for Specific types[1].

       

      [0] https://github.com/apache/parquet-mr/blob/d38044f5395494e1543581a4b763f624305d3022/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java#L403-L407

      [1] https://github.com/apache/avro/blob/release-1.11.1/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java#L76-L86

      Attachments

        Activity

          People

            clairemcginty Claire McGinty
            clairemcginty Claire McGinty
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: