I am writing unit-tests to some functionality in my application that reading data from CSV files using Spark.
I am reading the data using:
When I am reading this single file:
I am getting this schema:
When I am duplicating this file, I am getting the same schema.
The strange part is when I am adding new int column, it looks like spark is getting confused and think that the column that already identified as int are now string:
When I am reading only the second file, it looks fine:
For conclusion, it looks like there is a bug mixing the two features: header recognition and merge schema.