Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32885

Add DataStreamReader.table API

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.0
    • Fix Version/s: 3.1.0
    • Component/s: Structured Streaming
    • Labels:
      None

      Description

      This ticket aims to add a new `table` API in DataStreamReader, which is similar to the table API in DataFrameReader. Users can directly use this API to get a Streaming DataFrame on a table. Below is a simple example:

      Application 1 for initializing and starting the streaming job:

      val path = "/home/yuanjian.li/runtime/to_be_deleted"
      val tblName = "my_table"
      
      // Write some data to `my_table`
      spark.range(3).write.format("parquet").option("path", path).saveAsTable(tblName)
      
      // Read the table as a streaming source, write result to destination directory
      val table = spark.readStream.table(tblName)
      table.writeStream.format("parquet").option("checkpointLocation", "/home/yuanjian.li/runtime/to_be_deleted_ck").start("/home/yuanjian.li/runtime/to_be_deleted_2")
      

      Application 2 for appending new data:

      // Append new data into the path
      spark.range(5).write.format("parquet").option("path", "/home/yuanjian.li/runtime/to_be_deleted").mode("append").save()

      Check result:

      // The desitination directory should contains all written data
      spark.read.parquet("/home/yuanjian.li/runtime/to_be_deleted_2").show()
      

       

        Attachments

          Activity

            People

            • Assignee:
              XuanYuan Yuanjian Li
              Reporter:
              XuanYuan Yuanjian Li
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: