Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.0.0
-
None
-
None
-
Ubuntu 18
Spark 3.0
Kafka 2.0.0
Description
I am trying to read a Kafka topic with Spark readStream but getting problem while applying avro schema
Code:
df = spark\ .readStream\ .format("kafka")\ .option("kafka.bootstrap.servers", "host:6667")\ .option("subscribe", "utopic1")\ .option("failOnDataLoss", "false")\ .option("startingOffsets", "earliest")\ .option("checkpointLocation", "/home/abc/wspace/spark_test/data/")\ .load() outputDF = df\ .select(from_avro("value", jsonFormatSchema, options={"mode":"FASTFAIL"}).alias("user")) outputDF.printSchema() query = outputDF.writeStream.format("console").start() time.sleep(10)
Input:
avro schema file: user.avsc
Kafka topic: {'favorite_color': 'Red', 'name': 'Alyssa'}
Expected Output:
It should print values.
Actual Output:
+----+ |user| +----+ |[,]| +----+
Additional information:
- Searched in the internet and found that other peson faced same issue. https://stackoverflow.com/questions/59222774/spark-from-avro-function-returning-null-values
- I am able to print values to console if I cast to String using below line df.selectExpr("CAST(value AS STRING)")