Details
-
Task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.5.0
Description
Add support in Spark Connect to cache a DataFrame on server side. From client side, it can create a reference to that DataFrame given the cache key.
This function will be used in streaming foreachBatch(). Server needs to call user function for every batch which takes a DataFrame as argument. With the new function, we can just cache the DataFrame on the server. Pass the id back to client which can creates the DataFrame reference. The server will replace the reference when transforming.
Attachments
Issue Links
- causes
-
SPARK-46453 SessionHolder doesn't throw exceptions from internalError()
- Resolved
-
SPARK-45791 Rename `SparkConnectSessionHodlerSuite.scala` to `SparkConnectSessionHolderSuite.scala`
- Resolved
- links to