[SPARK-13485] (Dataset-oriented) API evolution in Spark 2.0 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.0.0
Component/s: SQL
Labels:
- releasenotes

Target Version/s:

2.0.0

Description

As part of Spark 2.0, we want to create a stable API foundation for Dataset to become the main user-facing API in Spark. This ticket tracks various tasks related to that.

The main high level changes are:

1. Merge Dataset/DataFrame
2. Create a more natural entry point for Dataset (SQLContext/HiveContext are not ideal because of the name "SQL"/"Hive", and "SparkContext" is not ideal because of its heavy dependency on RDDs)
3. First class support for sessions
4. First class support for some system catalog

See the design doc for more details.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

API Evolution in Spark 2.0.pdf
03/Mar/16 01:54
772 kB
Reynold Xin

Sub-Tasks

1.	Create SparkSession interface	Resolved	Andrew Or
2.	Move SQLConf into an internal package	Resolved	Reynold Xin
3.	User-facing RuntimeConfig interface	Resolved	Reynold Xin
4.	User-facing catalog API	Resolved	Andrew Or
5.	Design doc for configuration in Spark 2.0+	Closed	Reynold Xin
6.	Unify DataFrame and Dataset API	Resolved	Cheng Lian
7.	Improve user experience for typed aggregate functions in Scala	Resolved	Reynold Xin
8.	Improve user experience for typed aggregate functions in Java	Resolved	Eric Liang
9.	Create a hivecontext-compatibility module	Resolved	Yin Huai
10.	SparkSession should be case insensitive by default	Resolved	Reynold Xin
11.	Expose user-facing RuntimeConfig in SparkSession	Resolved	Andrew Or
12.	Start SparkSession in REPL instead of SQLContext	Resolved	Andrew Or
13.	Simplify configuration API	Resolved	Reynold Xin
14.	Python SparkSession API	Resolved	Andrew Or
15.	Implement catalog and conf API in Python SparkSession	Resolved	Andrew Or
16.	Simplify configuration API further	Resolved	Andrew Or
17.	Use builder pattern to create SparkSession	Resolved	Reynold Xin
18.	Remove SparkSession.withHiveSupport	Resolved	Sandeep Singh
19.	Use SparkSession in Scala/Python/Java example.	Resolved	Dongjoon Hyun
20.	Use SparkSession instead of SQLContext in testsuites	Resolved	Sandeep Singh
21.	Make SparkSession constructors private	Resolved	Andrew Or
22.	Cleanup dependencies between SQLContext and SparkSession	Resolved	Reynold Xin
23.	Use builder pattern to create SparkSession in PySpark	Resolved	Dongjoon Hyun
24.	Accept Dataset[_] in joins	Resolved	Reynold Xin
25.	RuntimeConfig.set should return Unit rather than RuntimeConfig itself	Resolved	Reynold Xin
26.	Remove experimental tag from DataFrameReader and DataFrameWriter	Resolved	Reynold Xin
27.	Remove experimental tag from Python DataFrame	Resolved	Reynold Xin
28.	Revert SPARK-14807 Create a hivecontext-compatibility module	Resolved	Reynold Xin
29.	SparkSession builder in python should also allow overriding confs of existing sessions	Resolved	Eric Liang

Activity

People

Assignee:: Reynold Xin

Reporter:: Reynold Xin

Votes:: 0 Vote for this issue

Watchers:: 32 Start watching this issue

Dates

Created:: 25/Feb/16 05:52

Updated:: 06/Jul/16 06:11

Resolved:: 30/Apr/16 07:22