Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26314

support Confluent encoded Avro in Spark Structured Streaming

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: 2.4.0
    • Fix Version/s: None
    • Component/s: Structured Streaming
    • Labels:
      None

      Description

      As Avro has now been added as a first class citizen,

      https://spark.apache.org/docs/latest/sql-data-sources-avro.html

      please make Confluent encoded avro work out of the box with Spark Structured Streaming

      as described in this link, Avro messages on Kafka encoded with confluent serializer also need to be decoded with confluent.  It would be great if this worked out of the box

      https://developer.ibm.com/answers/questions/321440/ibm-iidr-cdc-db2-to-kafka.html?smartspace=blockchain

      here are details on the Confluent encoding

      https://www.sderosiaux.com/articles/2017/03/02/serializing-data-efficiently-with-apache-avro-and-dealing-with-a-schema-registry/#encodingdecoding-the-messages-with-the-schema-id

      It's been a year since i worked on anything to do with Avro and Spark Structured Streaming, but i had to take an approach such as this when getting it to work.  This is what i  used as a reference at that time

      https://github.com/tubular/confluent-spark-avro

      Also, here is another link i found that someone has done in the meantime

      https://github.com/AbsaOSS/ABRiS

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              davidahern David Ahern
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: