Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24768 Have a built-in AVRO data source implementation
  3. SPARK-28698

Allow user-specified output schema in function `to_avro`

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      The mapping of Spark schema to Avro schema is many-to-many. (See https://spark.apache.org/docs/latest/sql-data-sources-avro.html#supported-types-for-spark-sql---avro-conversion)
      The default schema mapping might not be exactly what users want. For example, by default a "string" column is always written as "string" Avro type, but users might want to output the column as "enum" Avro type.
      With PR https://github.com/apache/spark/pull/21847, Spark supports user-specified schema in the batch writer.
      For the function `to_avro`, we should support user-specified output schema as well.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Gengliang.Wang Gengliang Wang
            Gengliang.Wang Gengliang Wang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment