Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
3.2.1
-
None
-
None
Description
When creating a dataframe using createDataFrame that contains a float inside a struct, the float is set to null. This only happens if using a list of dictionaries as data type, if I use a list of Rows it works fine:
data = [{"MyStruct": {"MyInt": 10, "MyFloat": 10.1}, "MyFloat": 10.1}] spark.createDataFrame(data).show() # +-------+------------------------------+ # |MyFloat|MyStruct | # +-------+------------------------------+ # |10.1 |{MyInt -> 10, MyFloat -> null}| # +-------+------------------------------+ data = [Row(MyStruct=Row(MyInt=10, MyFloat=10.1), MyFloat=10.1)] spark.createDataFrame(data).show() # +-------+------------------------------+ # |MyFloat|MyStruct | # +-------+------------------------------+ # |10.1 |{MyInt -> 10, MyFloat -> 10.1}| # +-------+------------------------------+
Note MyFloat inside MyStruct is set to null in the first example. Interestingly enough, when I do the same with Row, or if I specify the schema, then this does not happen (second example).
Attachments
Issue Links
- duplicates
-
SPARK-35929 Schema inference of nested structs defaults to map
- Resolved