Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24768 Have a built-in AVRO data source implementation
  3. SPARK-28698

Allow user-specified output schema in function `to_avro`

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      The mapping of Spark schema to Avro schema is many-to-many. (See https://spark.apache.org/docs/latest/sql-data-sources-avro.html#supported-types-for-spark-sql---avro-conversion)
      The default schema mapping might not be exactly what users want. For example, by default a "string" column is always written as "string" Avro type, but users might want to output the column as "enum" Avro type.
      With PR https://github.com/apache/spark/pull/21847, Spark supports user-specified schema in the batch writer.
      For the function `to_avro`, we should support user-specified output schema as well.

      Attachments

        Issue Links

          Activity

            People

              Gengliang.Wang Gengliang Wang
              Gengliang.Wang Gengliang Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: