Details
Description
Motivation
Current the scala sql/core and connect API share the same API; connect implements a subset of the functionality of the sql/core API. The compatibility of the two implementations is enforced by MiMa checks.
While this sort of works for application development, it is not ideal for a couple of reasons:
- An application developer needs to pick against which API they are going to develop while setting up their project (they need to select the correct dependencies). While it is true, that they can this change later, it does put a mental burden on de the developer. A much preferred solution would be to defer binding to an implementation until you run the code.
- (Minor) the current setup confuses IDEs, and is more of a pain to work with especially for Spark developers.
- Developing and maintaining Spark API is more difficult because of the added burden of working with MiMa and/or adding the same API in more places.
- Connect testing is fairly anaemic. We have seen a couple of cases where connect behaves slightly different, and this could have been detected if connect was able to leverage Spark SQLs extensive testing.
Goals
- Create a truly shared Scala API with two implementations. The goal is not to replace/simplify/reduce the current sql/core API we all love, the interface will only support the API shared between the implementations. An implementation can provide additional functionality (e.g. RDD centric methods for the sql/core implementation).
- The common interface should cover all API supported by the current Connect Scala client.
- Maintain as much binary compatibility with previous Spark releases as possible
Design Notes
- We are going to try to make the interface very connect centric. Where possible we will implement functionality using the connect API.
- .... TBD
Attachments
Attachments
Issue Links
- is related to
-
SPARK-44111 Prepare Apache Spark 4.0.0
- Open
Issues in epic
SPARK-43415 | Impl mapValues for KVGDS#mapValues | Open | Pengfei Xu | |||
|
SPARK-48919 | Create a dedicated project for Connect code generation | Resolved | Herman van Hövell | ||
|
SPARK-48985 | Remote (most) hard coded expressions from SparkConnectPlanner/functions.scala | Resolved | Herman van Hövell | ||
|
SPARK-48986 | Introduce a ColumnNode API | Resolved | Herman van Hövell | ||
|
SPARK-49004 | Create a function lookup registry for non-sql functions | Resolved | Herman van Hövell | ||
|
SPARK-49022 | Integrate Basic ColumnNode API in Column | Resolved | Herman van Hövell | ||
|
SPARK-49023 | Integrate Column Node API in Dataset | Resolved | Unassigned | ||
|
SPARK-49024 | Create Column Node API for UDFs | Resolved | Herman van Hövell | ||
|
SPARK-49025 | Prepare move to SQL API for Column classes | Resolved | Herman van Hövell | ||
|
SPARK-49026 | Create conversions Column API to Connect protos | Resolved | Herman van Hövell | ||
|
SPARK-49027 | Move Column API to sql/api | Resolved | Herman van Hövell | ||
|
SPARK-49028 | Create a shared interface for SparkSession | Resolved | Herman van Hövell | ||
|
SPARK-49029 | Create a shared interface for Dataset | Resolved | Herman van Hövell | ||
|
SPARK-49083 | Allow from_json and from_xml to parse json schemas as well | Resolved | Herman van Hövell | ||
|
SPARK-49084 | Remove special casing for AVRO functions in Connect | Resolved | Ruifeng Zheng | ||
|
SPARK-49085 | Remove special casing for Protobuf functions in Connect | Resolved | Haejoon Lee | ||
|
SPARK-49086 | Remove special casing for ML functions in Connect | Resolved | Herman van Hövell | ||
SPARK-49087 | Move all Connect invocations of system.internal function to the proper namespace | Open | Unassigned | |||
SPARK-49088 | Remove ignoreNulls from UnresolvedFunction and Analyzer | Open | Unassigned | |||
|
SPARK-49089 | Remove special cased Catalyst Expressions from the SparkConnectPlanner | Resolved | Herman van Hövell | ||
|
SPARK-49225 | Add sql and normalize functionality to ColumnNode | Resolved | Herman van Hövell | ||
|
SPARK-49226 | Refactor UDF code generation | Resolved | Herman van Hövell | ||
|
SPARK-49227 | Integrate UDF ColumnNode API in Column | Resolved | Herman van Hövell | ||
SPARK-49229 | Deduplicate Scala UDF handling in the SparkConnectPlanner | Open | Unassigned | |||
SPARK-49230 | ScalaReflection should not return UnboundRowEncoder if we don't want one. | Open | Unassigned | |||
SPARK-49231 | Use UnboundRowEncoders for ScalaUDF/ScalaUDAF inputs | Open | Unassigned | |||
SPARK-49273 | Add support for Origin in Scala Client | Open | Unassigned | |||
|
SPARK-49274 | Support java serialization in AgnosticEncoders | Resolved | Herman van Hövell | ||
|
SPARK-49282 | Create shared interface for SparkSession.Builder | Resolved | Herman van Hövell | ||
SPARK-49283 | Create a generic builder for SparkSession implementations | In Progress | Herman van Hövell | |||
|
SPARK-49284 | Create a shared Catalog interface | Resolved | Herman van Hövell | ||
|
SPARK-49285 | Move the catalog interfaces to sql/api | Resolved | Herman van Hövell | ||
|
SPARK-49286 | Move the Avro and Protobuf functions.scala files to sql/api | Resolved | Herman van Hövell | ||
|
SPARK-49287 | Move shared streaming progress classes to sql/api | Resolved | Herman van Hövell | ||
|
SPARK-49307 | Support Kryo Serialization with AgnosticEncoders | Resolved | Herman van Hövell | ||
SPARK-49308 | Support UserDefinedAggregateFunction in Connect | Open | Unassigned | |||
|
SPARK-49369 | Create implicit Column conversions to reduce upgrade issues | Resolved | Herman van Hövell | ||
SPARK-49371 | Reenable Scala Connect Client Doc builder | Open | Unassigned | |||
|
SPARK-49413 | Create shared RuntimeConf interface | Resolved | Herman van Hövell | ||
|
SPARK-49414 | Create a shared DataFrameReader interface | Resolved | Herman van Hövell | ||
|
SPARK-49415 | Create a shared interface for SQLImplicits | Resolved | Herman van Hövell | ||
|
SPARK-49416 | Create a shared interface for DataStreamReader | Resolved | Herman van Hövell | ||
|
SPARK-49417 | Create a shared interface for StreamingQueryManager | Resolved | Herman van Hövell | ||
|
SPARK-49418 | Active/default ThreadLocals for shared SparkSession | Resolved | Herman van Hövell | ||
|
SPARK-49419 | Create a shared DataFrameStatFunctions interface | Resolved | Herman van Hövell | ||
|
SPARK-49420 | Create a shared DataFrameNaFunctions interface | Resolved | Herman van Hövell | ||
|
SPARK-49421 | Create a shared RelationalGroupedDataset interface | Resolved | Herman van Hövell | ||
|
SPARK-49422 | Create a shared KeyValueGroupedDataset interface | Resolved | Herman van Hövell | ||
|
SPARK-49423 | Consolidate Observation into a single class in sql/api | Resolved | Herman van Hövell | ||
|
SPARK-49424 | Consolidate Encoders in sql/api | Resolved | Herman van Hövell | ||
|
SPARK-49425 | Create a shared DataFrameWriter interface | Resolved | Herman van Hövell | ||
|
SPARK-49426 | Create a shared DataFrameWriterV2 interface | Resolved | Herman van Hövell | ||
|
SPARK-49427 | Create a shared MergeIntoWriter interface | Resolved | Herman van Hövell | ||
SPARK-49428 | Rename/Reorg Connect Client Packages | Open | Unassigned | |||
|
SPARK-49429 | Create a shared DataStreamWriter interface | Resolved | Herman van Hövell | ||
SPARK-49430 | Add SparkResult to Classic | Open | Unassigned | |||
|
SPARK-49431 | Consolidate ForeachWriter into sql/api | Resolved | Herman van Hövell | ||
|
SPARK-49432 | Consolidate StreamingQuery in sql/api | Resolved | Herman van Hövell | ||
SPARK-49433 | Protect connect UdfUtils | Open | Unassigned | |||
|
SPARK-49434 | Move org.apache.spark.sql.expressions.javalang.typed/scalalang.type to sql/api | Resolved | Herman van Hövell | ||
|
SPARK-49435 | Move ReduceAggregator to sql/api | Resolved | Herman van Hövell | ||
SPARK-49436 | Add shared SQLContext interface | Open | Pengfei Xu | |||
SPARK-49437 | Create interface implementation tests | Open | Unassigned | |||
|
SPARK-49568 | Remove Self Type for Dataset | Resolved | Herman van Hövell | ||
|
SPARK-49569 | Introduce Shim for missing spark/core classes | Resolved | Herman van Hövell | ||
|
SPARK-49570 | Use agnostic encoders instead of expression encoders in user facing API | Resolved | Herman van Hövell | ||
SPARK-49571 | Revamp Connect MiMa tests and make it work with the new interface. | Open | Unassigned | |||
SPARK-49572 | See if regular MiMa can deal with multiple jars | Open | Unassigned | |||
SPARK-49573 | MiMa checks should run for all sql projects | Open | Unassigned | |||
|
SPARK-49574 | ExpressionEncoder should track its AgnosticEncoder | Resolved | Herman van Hövell | ||
SPARK-49587 | Improve UDF Packet Serialization | Open | Unassigned | |||
SPARK-49588 | Fix SerialVersionUids for all/most classes that are serialized | Open | Unassigned | |||
SPARK-49589 | Cache AgnosticEncoders where possible | Open | Unassigned | |||
SPARK-49697 | Figure out what to do with ExecutionListenerManager | Open | Unassigned | |||
SPARK-49698 | Add ClassicOnly annotation for Classic only features | Open | Unassigned | |||
SPARK-49700 | Switch interface and implementations | In Progress | Herman van Hövell | |||
|
SPARK-49709 | Restore RuntimeConfig ConfigEntry functionality | Resolved | Herman van Hövell | ||
SPARK-49710 | Make sure internal/dev API is properly annotated in SparkSession | Open | Unassigned | |||
SPARK-49711 | Remove or Deprecate ExperimentalMethods | In Progress | Herman van Hövell | |||
|
SPARK-49712 | Replace org.apache.spark.sql.encoderFor with AgnosticEncoders.agnosticEncoderFor | Resolved | Herman van Hövell | ||
SPARK-49759 | Unify Data*Writer/Data*Reader interfaces | Open | Unassigned | |||
|
SPARK-49769 | Replace ExpressionUtils.colum with ClassicConversions | Resolved | Herman van Hövell | ||
|
SPARK-50102 | Add shims for all classic only classes | Resolved | Herman van Hövell | ||
SPARK-50103 | Fix SerialVersionUids for all interface classes | Open | Unassigned | |||
SPARK-50104 | Support SparkSession.executeCommand in Connect | Open | Unassigned | |||
SPARK-50105 | Propose removal of SessionState/SharedState/ExperimentalMethods from public interface. | Open | Unassigned | |||
|
SPARK-50264 | Add missing methods back to DataStreamReader interface | Resolved | Herman van Hövell | ||
SPARK-50265 | Add missing register method to Connect UDFRegistration | Open | Herman van Hövell | |||
SPARK-50367 | Move Ammonite REPL integration to sql-api | Open | Unassigned | |||
SPARK-50368 | Validate shading rules for connect client and connect server | Open | Unassigned | |||
SPARK-50369 | Document UDF classpath differences between Classic and Connect | Open | Unassigned | |||
SPARK-50371 | Introduce a shared plan representation | Open | Unassigned | |||
|
SPARK-50473 | Add generic ColumnConversions helper | Resolved | Herman van Hövell | ||
SPARK-50556 | Document development workflow with interfaces | Open | Unassigned | |||
SPARK-50557 | Add RuntimeConf.contains(..) | Open | Unassigned |
SPARK-48918
Unified SQL Scala Interface
false
SPARK-48918
Unified SQL Scala Interface