Description
In Spark 2.0, after unifying Datasets and DataFrames, we made two API breaking changes:
- DataFrameReader.text() now returns Dataset[String] instead of DataFrame
- SQLContext.range() now returns Dataset[java.lang.Long] instead of DataFrame
However, these two changes introduced several inconsistencies and problems:
- spark.read.text() silently discards partitioned columns when reading a partitioned table in text format since Dataset[String] only contains a single field. Users have to use spark.read.format("text").load() to workaround this, which is pretty confusing and error-prone.
- All data source shortcut methods in `DataFrameReader` return DataFrame (aka Dataset[Row]) except for DataFrameReader.text().
- When applying typed operations over Datasets returned by spark.range(), weird schema changes may happen. Please refer to
SPARK-15632for more details.
Due to these reasons, we decided to revert these two changes.