Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32834

from_avro is giving empty result

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0
    • None
    • PySpark
    • None
    • Ubuntu 18

      Spark 3.0

      Kafka 2.0.0

    Description

      I am trying to read a Kafka topic with Spark readStream but getting problem while applying avro schema

       

      Code:

      df = spark\
        .readStream\
        .format("kafka")\
        .option("kafka.bootstrap.servers", "host:6667")\
        .option("subscribe", "utopic1")\
        .option("failOnDataLoss", "false")\
        .option("startingOffsets", "earliest")\
        .option("checkpointLocation", "/home/abc/wspace/spark_test/data/")\
        .load()
       
      outputDF = df\
              .select(from_avro("value", jsonFormatSchema, options={"mode":"FASTFAIL"}).alias("user"))
      
      outputDF.printSchema()
      
      query = outputDF.writeStream.format("console").start()
      time.sleep(10)
      

      Input:

      avro schema file: user.avsc

      Kafka topic: {'favorite_color': 'Red', 'name': 'Alyssa'}

      Expected Output:

      It should print values. 

      Actual Output:

      +----+
      |user|
      +----+
      |[,]|
      +----+
      

      Additional information:

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            chaitanya.cheekate Chaitanya
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: