Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15426 Spark 2.0 SQL API audit
  3. SPARK-15856

Revert API breaking changes made in SQLContext.range

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • SQL
    • None

    Description

      In Spark 2.0, after unifying Datasets and DataFrames, we made two API breaking changes:

      1. DataFrameReader.text() now returns Dataset[String] instead of DataFrame
      2. SQLContext.range() now returns Dataset[java.lang.Long] instead of DataFrame

      However, these two changes introduced several inconsistencies and problems:

      1. spark.read.text() silently discards partitioned columns when reading a partitioned table in text format since Dataset[String] only contains a single field. Users have to use spark.read.format("text").load() to workaround this, which is pretty confusing and error-prone.
      2. All data source shortcut methods in `DataFrameReader` return DataFrame (aka Dataset[Row]) except for DataFrameReader.text().
      3. When applying typed operations over Datasets returned by spark.range(), weird schema changes may happen. Please refer to SPARK-15632 for more details.

      Due to these reasons, we decided to revert these two changes.

      Attachments

        Activity

          People

            cloud_fan Wenchen Fan
            lian cheng Cheng Lian
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: