[SPARK-48918] Create a unified SQL Scala interface shared by regular SQL and Connect. - ASF JIRA

Attach files

Attach Screenshot

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Epic
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 4.0.0
Fix Version/s: None
Component/s: Connect, SQL
Labels:
None

Epic Name:
Unified SQL Scala Interface

Description

Motivation

Current the scala sql/core and connect API share the same API; connect implements a subset of the functionality of the sql/core API. The compatibility of the two implementations is enforced by MiMa checks.

While this sort of works for application development, it is not ideal for a couple of reasons:

An application developer needs to pick against which API they are going to develop while setting up their project (they need to select the correct dependencies). While it is true, that they can this change later, it does put a mental burden on de the developer. A much preferred solution would be to defer binding to an implementation until you run the code.
(Minor) the current setup confuses IDEs, and is more of a pain to work with especially for Spark developers.
Developing and maintaining Spark API is more difficult because of the added burden of working with MiMa and/or adding the same API in more places.
Connect testing is fairly anaemic. We have seen a couple of cases where connect behaves slightly different, and this could have been detected if connect was able to leverage Spark SQLs extensive testing.

Goals

Create a truly shared Scala API with two implementations. The goal is not to replace/simplify/reduce the current sql/core API we all love, the interface will only support the API shared between the implementations. An implementation can provide additional functionality (e.g. RDD centric methods for the sql/core implementation).
The common interface should cover all API supported by the current Connect Scala client.
Maintain as much binary compatibility with previous Spark releases as possible

Design Notes

We are going to try to make the interface very connect centric. Where possible we will implement functionality using the connect API.
.... TBD

Attachments

Issue Links

Add Link

is related to

SPARK-44111 Prepare Apache Spark 4.0.0

Open

Delete this link

Issues in epic

quick-create-issue-for-epic-label

SPARK-43415	Impl mapValues for KVGDS#mapValues	Open	Pengfei Xu	Actions
SPARK-48919	Create a dedicated project for Connect code generation	Resolved	Herman van Hövell	Actions
SPARK-48985	Remote (most) hard coded expressions from SparkConnectPlanner/functions.scala	Resolved	Herman van Hövell	Actions
SPARK-48986	Introduce a ColumnNode API	Resolved	Herman van Hövell	Actions
SPARK-49004	Create a function lookup registry for non-sql functions	Resolved	Herman van Hövell	Actions
SPARK-49022	Integrate Basic ColumnNode API in Column	Resolved	Herman van Hövell	Actions
SPARK-49023	Integrate Column Node API in Dataset	Resolved	Unassigned	Actions
SPARK-49024	Create Column Node API for UDFs	Resolved	Herman van Hövell	Actions
SPARK-49025	Prepare move to SQL API for Column classes	Resolved	Herman van Hövell	Actions
SPARK-49026	Create conversions Column API to Connect protos	Resolved	Herman van Hövell	Actions
SPARK-49027	Move Column API to sql/api	Resolved	Herman van Hövell	Actions
SPARK-49028	Create a shared interface for SparkSession	Resolved	Herman van Hövell	Actions
SPARK-49029	Create a shared interface for Dataset	Resolved	Herman van Hövell	Actions
SPARK-49083	Allow from_json and from_xml to parse json schemas as well	Resolved	Herman van Hövell	Actions
SPARK-49084	Remove special casing for AVRO functions in Connect	Resolved	Ruifeng Zheng	Actions
SPARK-49085	Remove special casing for Protobuf functions in Connect	Resolved	Haejoon Lee	Actions
SPARK-49086	Remove special casing for ML functions in Connect	Resolved	Herman van Hövell	Actions
SPARK-49087	Move all Connect invocations of system.internal function to the proper namespace	Open	Unassigned	Actions
SPARK-49088	Remove ignoreNulls from UnresolvedFunction and Analyzer	Open	Unassigned	Actions
SPARK-49089	Remove special cased Catalyst Expressions from the SparkConnectPlanner	Resolved	Herman van Hövell	Actions
SPARK-49225	Add sql and normalize functionality to ColumnNode	Resolved	Herman van Hövell	Actions
SPARK-49226	Refactor UDF code generation	Resolved	Herman van Hövell	Actions
SPARK-49227	Integrate UDF ColumnNode API in Column	Resolved	Herman van Hövell	Actions
SPARK-49229	Deduplicate Scala UDF handling in the SparkConnectPlanner	Open	Unassigned	Actions
SPARK-49230	ScalaReflection should not return UnboundRowEncoder if we don't want one.	Open	Unassigned	Actions
SPARK-49231	Use UnboundRowEncoders for ScalaUDF/ScalaUDAF inputs	Open	Unassigned	Actions
SPARK-49273	Add support for Origin in Scala Client	Open	Unassigned	Actions
SPARK-49274	Support java serialization in AgnosticEncoders	Resolved	Herman van Hövell	Actions
SPARK-49282	Create shared interface for SparkSession.Builder	Resolved	Herman van Hövell	Actions
SPARK-49283	Create a generic builder for SparkSession implementations	In Progress	Herman van Hövell	Actions
SPARK-49284	Create a shared Catalog interface	Resolved	Herman van Hövell	Actions
SPARK-49285	Move the catalog interfaces to sql/api	Resolved	Herman van Hövell	Actions
SPARK-49286	Move the Avro and Protobuf functions.scala files to sql/api	Resolved	Herman van Hövell	Actions
SPARK-49287	Move shared streaming progress classes to sql/api	Resolved	Herman van Hövell	Actions
SPARK-49307	Support Kryo Serialization with AgnosticEncoders	Resolved	Herman van Hövell	Actions
SPARK-49308	Support UserDefinedAggregateFunction in Connect	Open	Unassigned	Actions
SPARK-49369	Create implicit Column conversions to reduce upgrade issues	Resolved	Herman van Hövell	Actions
SPARK-49371	Reenable Scala Connect Client Doc builder	Open	Unassigned	Actions
SPARK-49413	Create shared RuntimeConf interface	Resolved	Herman van Hövell	Actions
SPARK-49414	Create a shared DataFrameReader interface	Resolved	Herman van Hövell	Actions
SPARK-49415	Create a shared interface for SQLImplicits	Resolved	Herman van Hövell	Actions
SPARK-49416	Create a shared interface for DataStreamReader	Resolved	Herman van Hövell	Actions
SPARK-49417	Create a shared interface for StreamingQueryManager	Resolved	Herman van Hövell	Actions
SPARK-49418	Active/default ThreadLocals for shared SparkSession	Resolved	Herman van Hövell	Actions
SPARK-49419	Create a shared DataFrameStatFunctions interface	Resolved	Herman van Hövell	Actions
SPARK-49420	Create a shared DataFrameNaFunctions interface	Resolved	Herman van Hövell	Actions
SPARK-49421	Create a shared RelationalGroupedDataset interface	Resolved	Herman van Hövell	Actions
SPARK-49422	Create a shared KeyValueGroupedDataset interface	Resolved	Herman van Hövell	Actions
SPARK-49423	Consolidate Observation into a single class in sql/api	Resolved	Herman van Hövell	Actions
SPARK-49424	Consolidate Encoders in sql/api	Resolved	Herman van Hövell	Actions
SPARK-49425	Create a shared DataFrameWriter interface	Resolved	Herman van Hövell	Actions
SPARK-49426	Create a shared DataFrameWriterV2 interface	Resolved	Herman van Hövell	Actions
SPARK-49427	Create a shared MergeIntoWriter interface	Resolved	Herman van Hövell	Actions
SPARK-49428	Rename/Reorg Connect Client Packages	Open	Unassigned	Actions
SPARK-49429	Create a shared DataStreamWriter interface	Resolved	Herman van Hövell	Actions
SPARK-49430	Add SparkResult to Classic	Open	Unassigned	Actions
SPARK-49431	Consolidate ForeachWriter into sql/api	Resolved	Herman van Hövell	Actions
SPARK-49432	Consolidate StreamingQuery in sql/api	Resolved	Herman van Hövell	Actions
SPARK-49433	Protect connect UdfUtils	Open	Unassigned	Actions
SPARK-49434	Move org.apache.spark.sql.expressions.javalang.typed/scalalang.type to sql/api	Resolved	Herman van Hövell	Actions
SPARK-49435	Move ReduceAggregator to sql/api	Resolved	Herman van Hövell	Actions
SPARK-49436	Add shared SQLContext interface	Open	Pengfei Xu	Actions
SPARK-49437	Create interface implementation tests	Open	Unassigned	Actions
SPARK-49568	Remove Self Type for Dataset	Resolved	Herman van Hövell	Actions
SPARK-49569	Introduce Shim for missing spark/core classes	Resolved	Herman van Hövell	Actions
SPARK-49570	Use agnostic encoders instead of expression encoders in user facing API	Resolved	Herman van Hövell	Actions
SPARK-49571	Revamp Connect MiMa tests and make it work with the new interface.	Open	Unassigned	Actions
SPARK-49572	See if regular MiMa can deal with multiple jars	Open	Unassigned	Actions
SPARK-49573	MiMa checks should run for all sql projects	Open	Unassigned	Actions
SPARK-49574	ExpressionEncoder should track its AgnosticEncoder	Resolved	Herman van Hövell	Actions
SPARK-49587	Improve UDF Packet Serialization	Open	Unassigned	Actions
SPARK-49588	Fix SerialVersionUids for all/most classes that are serialized	Open	Unassigned	Actions
SPARK-49589	Cache AgnosticEncoders where possible	Open	Unassigned	Actions
SPARK-49697	Figure out what to do with ExecutionListenerManager	Open	Unassigned	Actions
SPARK-49698	Add ClassicOnly annotation for Classic only features	Open	Unassigned	Actions
SPARK-49700	Switch interface and implementations	In Progress	Herman van Hövell	Actions
SPARK-49709	Restore RuntimeConfig ConfigEntry functionality	Resolved	Herman van Hövell	Actions
SPARK-49710	Make sure internal/dev API is properly annotated in SparkSession	Open	Unassigned	Actions
SPARK-49711	Remove or Deprecate ExperimentalMethods	In Progress	Herman van Hövell	Actions
SPARK-49712	Replace org.apache.spark.sql.encoderFor with AgnosticEncoders.agnosticEncoderFor	Resolved	Herman van Hövell	Actions
SPARK-49759	Unify DataWriter/DataReader interfaces	Open	Unassigned	Actions
SPARK-49769	Replace ExpressionUtils.colum with ClassicConversions	Resolved	Herman van Hövell	Actions
SPARK-50102	Add shims for all classic only classes	Resolved	Herman van Hövell	Actions
SPARK-50103	Fix SerialVersionUids for all interface classes	Open	Unassigned	Actions
SPARK-50104	Support SparkSession.executeCommand in Connect	Open	Unassigned	Actions
SPARK-50105	Propose removal of SessionState/SharedState/ExperimentalMethods from public interface.	Open	Unassigned	Actions
SPARK-50264	Add missing methods back to DataStreamReader interface	Resolved	Herman van Hövell	Actions
SPARK-50265	Add missing register method to Connect UDFRegistration	Open	Herman van Hövell	Actions
SPARK-50367	Move Ammonite REPL integration to sql-api	Open	Unassigned	Actions
SPARK-50368	Validate shading rules for connect client and connect server	Open	Unassigned	Actions
SPARK-50369	Document UDF classpath differences between Classic and Connect	Open	Unassigned	Actions
SPARK-50371	Introduce a shared plan representation	Open	Unassigned	Actions
SPARK-50473	Add generic ColumnConversions helper	Resolved	Herman van Hövell	Actions
SPARK-50556	Document development workflow with interfaces	Open	Unassigned	Actions
SPARK-50557	Add RuntimeConf.contains(..)	Open	Unassigned	Actions