[SPARK-34163] Spark Structured Streaming - Kafka avro transformation on optional field Failed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.4.7
Fix Version/s: None
Component/s: Structured Streaming
Labels:
None

Description

Hello All,
I have a spark structured streaming job to inject data from Kafka where message from Kafka is avro type.
Some of the fields are optional in the data. And I have to perform transformation if those optional fields are present in the data.
So I tried to check whether the column exists by :

def has_column(dataframe, col):
"""
This function checks the existence of a given column in the given DataFrame

:param dataframe: the dataframe
:type dataframe: DataFrame
:param col: the column name
:type col: str
:return: true if the column exists else false
:rtype: boolean
"""
try:
dataframe[col]
return True
except AnalysisException:
return False

But it seems not working when its a streaming dataframe, but when the dataframe is normal dataframe, and when a column is not present the above check returns false, therefore I can ignore the transformation on the missing column.

But on Streaming dataframe has_column always returns true and therefore the transformation get executed and cause exception. What is the right approach to check existence of column in a streaming dataframe before performing transformation?

Why streaming dataframe and normal dataframe differ in behavior? How to skip transformation on a column if it doesn't exists?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Felix Kizhakkel Jose

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 19/Jan/21 23:21

Updated:: 12/Dec/22 18:10