Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-774

Spark to Avro converter incorrectly generates optional fields

    XMLWordPrintableJSON

Details

    Description

      I think https://issues.apache.org/jira/browse/SPARK-28008 is a good descriptions of what is happening.

       

      It can cause a situation when schema in the MOR log files is incompatible with the schema produced by RowBasedSchemaProvider, so compactions will stall.

       

      I have a fix which is a bit hacky -> postprocess schema produced by the converter and

      1) Make sure unions with null types have those null types at position 0

      2) They have default values set to null

      I couldn't find a way to do a clean fix as some classes that are problematic are from Hive and called from Spark.

      Attachments

        Activity

          People

            Unassigned Unassigned
            afilipchik Alexander Filipchik
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 10m
                10m