Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.4.7
-
None
-
None
Description
Hello All,
I have a spark structured streaming job to inject data from Kafka where message from Kafka is avro type.
Some of the fields are optional in the data. And I have to perform transformation if those optional fields are present in the data.
So I tried to check whether the column exists by :
def has_column(dataframe, col):
"""
This function checks the existence of a given column in the given DataFrame
:param dataframe: the dataframe
:type dataframe: DataFrame
:param col: the column name
:type col: str
:return: true if the column exists else false
:rtype: boolean
"""
try:
dataframe[col]
return True
except AnalysisException:
return False
But it seems not working when its a streaming dataframe, but when the dataframe is normal dataframe, and when a column is not present the above check returns false, therefore I can ignore the transformation on the missing column.
But on Streaming dataframe has_column always returns true and therefore the transformation get executed and cause exception. What is the right approach to check existence of column in a streaming dataframe before performing transformation?
Why streaming dataframe and normal dataframe differ in behavior? How to skip transformation on a column if it doesn't exists?