Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13485

(Dataset-oriented) API evolution in Spark 2.0

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 2.0.0
    • SQL

    Description

      As part of Spark 2.0, we want to create a stable API foundation for Dataset to become the main user-facing API in Spark. This ticket tracks various tasks related to that.

      The main high level changes are:

      1. Merge Dataset/DataFrame
      2. Create a more natural entry point for Dataset (SQLContext/HiveContext are not ideal because of the name "SQL"/"Hive", and "SparkContext" is not ideal because of its heavy dependency on RDDs)
      3. First class support for sessions
      4. First class support for some system catalog

      See the design doc for more details.

      Attachments

        1. API Evolution in Spark 2.0.pdf
          772 kB
          Reynold Xin
        1.
        Create SparkSession interface Sub-task Resolved Andrew Or
        2.
        Move SQLConf into an internal package Sub-task Resolved Reynold Xin
        3.
        User-facing RuntimeConfig interface Sub-task Resolved Reynold Xin
        4.
        User-facing catalog API Sub-task Resolved Andrew Or
        5.
        Design doc for configuration in Spark 2.0+ Sub-task Closed Reynold Xin
        6.
        Unify DataFrame and Dataset API Sub-task Resolved Cheng Lian
        7.
        Improve user experience for typed aggregate functions in Scala Sub-task Resolved Reynold Xin
        8.
        Improve user experience for typed aggregate functions in Java Sub-task Resolved Eric Liang
        9.
        Create a hivecontext-compatibility module Sub-task Resolved Yin Huai
        10.
        SparkSession should be case insensitive by default Sub-task Resolved Reynold Xin
        11.
        Expose user-facing RuntimeConfig in SparkSession Sub-task Resolved Andrew Or
        12.
        Start SparkSession in REPL instead of SQLContext Sub-task Resolved Andrew Or
        13.
        Simplify configuration API Sub-task Resolved Reynold Xin
        14.
        Python SparkSession API Sub-task Resolved Andrew Or
        15.
        Implement catalog and conf API in Python SparkSession Sub-task Resolved Andrew Or
        16.
        Simplify configuration API further Sub-task Resolved Andrew Or
        17.
        Use builder pattern to create SparkSession Sub-task Resolved Reynold Xin
        18.
        Remove SparkSession.withHiveSupport Sub-task Resolved Sandeep Singh
        19.
        Use SparkSession in Scala/Python/Java example. Sub-task Resolved Dongjoon Hyun
        20.
        Use SparkSession instead of SQLContext in testsuites Sub-task Resolved Sandeep Singh
        21.
        Make SparkSession constructors private Sub-task Resolved Andrew Or
        22.
        Cleanup dependencies between SQLContext and SparkSession Sub-task Resolved Reynold Xin
        23.
        Use builder pattern to create SparkSession in PySpark Sub-task Resolved Dongjoon Hyun
        24.
        Accept Dataset[_] in joins Sub-task Resolved Reynold Xin
        25.
        RuntimeConfig.set should return Unit rather than RuntimeConfig itself Sub-task Resolved Reynold Xin
        26.
        Remove experimental tag from DataFrameReader and DataFrameWriter Sub-task Resolved Reynold Xin
        27.
        Remove experimental tag from Python DataFrame Sub-task Resolved Reynold Xin
        28.
        Revert SPARK-14807 Create a hivecontext-compatibility module Sub-task Resolved Reynold Xin
        29.
        SparkSession builder in python should also allow overriding confs of existing sessions Sub-task Resolved Eric Liang

        Activity

          People

            rxin Reynold Xin
            rxin Reynold Xin
            Votes:
            0 Vote for this issue
            Watchers:
            32 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: