Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34163

Spark Structured Streaming - Kafka avro transformation on optional field Failed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.7
    • None
    • Structured Streaming
    • None

    Description

      Hello All,
      I have a spark structured streaming job to inject data from Kafka where message from Kafka is avro type.
      Some of the fields are optional in the data. And I have to perform transformation if those optional fields are present in the data. 
      So I tried to check whether the column exists by :

      def has_column(dataframe, col):
      """
      This function checks the existence of a given column in the given DataFrame

      :param dataframe: the dataframe
      :type dataframe: DataFrame
      :param col: the column name
      :type col: str
      :return: true if the column exists else false
      :rtype: boolean
      """
      try:
      dataframe[col]
      return True
      except AnalysisException:
      return False

      But it seems not working when its a streaming dataframe, but when the dataframe is normal dataframe, and when a column is not present the above check returns false, therefore I can ignore the transformation on the missing column.

      But on Streaming dataframe has_column always returns true and therefore the transformation get executed and cause exception. What is the right approach to check existence of column in a streaming dataframe before performing transformation?

      Why streaming dataframe and normal dataframe differ in behavior? How to skip transformation on a column if it doesn't exists?

      Attachments

        Activity

          People

            Unassigned Unassigned
            FelixKJose Felix Kizhakkel Jose
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: