When converting an RDD with a `float` type field to a spark dataframe with an `IntegerType` / `LongType` schema field, spark 1.6.2 and 1.6.3 silently convert the field values to nulls instead of throwing an error like `LongType can not accept object ___ in type <type 'float'>`. However, this seems to be fixed in Spark 2.0.2.
The following example should make the problem clear:
Instead of throwing an error like:
Spark converts all the values in the first column to nulls
Running `spark_df.show()` gives:
For the purposes of my computation, I'm doing a `mapPartitions` on a spark data frame, and for each partition, converting it into a pandas data frame, doing a few computations on this pandas dataframe and the return value will be a list of lists, which is converted to an RDD while being returned from 'mapPartitions' (for all partitions). This RDD is then converted into a spark dataframe similar to the example above, using `sqlContext.createDataFrame(rdd, schema)`. The rdd has a column that should be converted to a `LongType` in the spark data frame, but since it has missing values, it is a `float` type. When spark tries to create the data frame, it converts all the values in that column to nulls instead of throwing an error that there is a type mismatch.