Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43474

Add support to create DataFrame Reference in Spark connect

    XMLWordPrintableJSON

Details

    Description

      Add support in Spark Connect to cache a DataFrame on server side. From client side, it can create a reference to that DataFrame given the cache key.

       

      This function will be used in streaming foreachBatch(). Server needs to call user function for every batch which takes a DataFrame as argument. With the new function, we can just cache the DataFrame on the server. Pass the id back to client which can creates the DataFrame reference. The server will replace the reference when transforming.

      Attachments

        Issue Links

          Activity

            People

              rangadi Raghu Angadi
              pengzhon-db Peng Zhong
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: