Notice `should_be_int` has `string` datatype, according to documentation:
Spark SQL can convert an RDD of Row objects to a DataFrame, inferring the datatypes. Rows are constructed by passing a list of key/value pairs as kwargs to the Row class. The keys of this list define the column names of the table, and the types are inferred by sampling the whole dataset, similar to the inference that is performed on JSON files.
Schema inference works as expected when reading delimited files like
but not when using toDF() / createDataFrame() API calls.