Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31042

Error in writing a pyspark streaming dataframe created from Kafka source to a csv file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 2.4.5
    • None
    • None

    Description

      While writing a streaming dataframe created from Kafka source to a csv file gives following error in PySpark.

      NOTE : The same streaming dataframe is getting displayed in the console.

      sdf.writeStream.format("console").start().awaitTermination() // Working

      sdf.writeStream\
      .format("csv")\
      .option("path", "C://output")\
      .option("checkpointLocation", "C://Checkpoint")\
      .outputMode("append")\
      .start().awaitTermination() // Not working

      Error
      ---------
      *File "C:\Spark\python\pyspark\sql\utils.py", line 63, in deco
      return f(*a, **kw)
      File "C:\Spark\python\lib\py4j-0.10.7-src.zip\py4j\protocol.py", line 328, in get_return_value
      py4j.protocol.Py4JJavaError: An error occurred while calling o63.awaitTermination.
      : org.apache.spark.sql.streaming.StreamingQueryException: Expected e.g. {"topicA":

      {"0":23,"1":-1}

      ,"topicB":{"0":-2}}, got

      {"logOffset":1}

      === Streaming Query ===
      Identifier: [id = 6718625c-489e-44c8-b273-0da3429e97a8, runId = b64887ba-ca32-499e-9ab5-f839fd44ec26]
      Current Committed Offsets: {KafkaV2[Subscribe[test1]]: {"logOffset":1}}
      Current Available Offsets: {KafkaV2[Subscribe[test1]]: {"logOffset":1}}

      Current State: ACTIVE
      Thread State: RUNNABLE*

      Attachments

        Activity

          People

            Unassigned Unassigned
            Patnaik Suchintak Patnaik
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: