Description
Apache Avro (https://avro.apache.org) is a popular data serialization format. It is widely used in the Spark and Hadoop ecosystem, especially for Kafka-based data pipelines. Using the external package https://github.com/databricks/spark-avro, Spark SQL can read and write the avro data. Making spark-Avro built-in can provide a better experience for first-time users of Spark SQL and structured streaming. We expect the built-in Avro data source can further improve the adoption of structured streaming. The proposal is to inline code from spark-avro package (https://github.com/databricks/spark-avro). The target release is Spark 2.4.
Attachments
Attachments
Issue Links
- contains
-
SPARK-24741 Have a built-in AVRO data source implementation
- Resolved
- is duplicated by
-
SPARK-26062 Rename spark-avro external module to spark-sql-avro (to match spark-sql-kafka)
- Closed
- links to