Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32885

Add DataStreamReader.table API

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.0
    • 3.1.0
    • Structured Streaming
    • None

    Description

      This ticket aims to add a new `table` API in DataStreamReader, which is similar to the table API in DataFrameReader. Users can directly use this API to get a Streaming DataFrame on a table. Below is a simple example:

      Application 1 for initializing and starting the streaming job:

      val path = "/home/yuanjian.li/runtime/to_be_deleted"
      val tblName = "my_table"
      
      // Write some data to `my_table`
      spark.range(3).write.format("parquet").option("path", path).saveAsTable(tblName)
      
      // Read the table as a streaming source, write result to destination directory
      val table = spark.readStream.table(tblName)
      table.writeStream.format("parquet").option("checkpointLocation", "/home/yuanjian.li/runtime/to_be_deleted_ck").start("/home/yuanjian.li/runtime/to_be_deleted_2")
      

      Application 2 for appending new data:

      // Append new data into the path
      spark.range(5).write.format("parquet").option("path", "/home/yuanjian.li/runtime/to_be_deleted").mode("append").save()

      Check result:

      // The desitination directory should contains all written data
      spark.read.parquet("/home/yuanjian.li/runtime/to_be_deleted_2").show()
      

       

      Attachments

        Activity

          People

            XuanYuan Yuanjian Li
            XuanYuan Yuanjian Li
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: