Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
1.6.1
-
None
Description
Here's a brief reproduction:
>>> numbers = sqlContext.createDataFrame( ... data=[(1,), (2,), (3,), (4,), (5,)], ... samplingRatio=1 # go through all the data please! ... ) >>> numbers.printSchema() root |-- _1: long (nullable = true)
The field is marked as nullable even though none of the data is null and we had createDataFrame() go through all the data.
In situations like this, shouldn't createDataFrame() return a DataFrame with the field marked as not nullable?