Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-49051

Provide modifiedAfter and modifiedBefore options when filtering from a stream source

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.5.1
    • None
    • SQL
    • None

    Description

      In the following Jira issue
      https://issues.apache.org/jira/browse/SPARK-31962

      Two new options, modifiiedBefore and modifiedAfter for batch reads (for example, CSV) where introduced, and eventually merged into version 3.1.1 via PR:

      https://issues.apache.org/jira/browse/SPARK-31962

       

      This was introduced in a way that batch reads allow these two options, however a stream is explicitly not allowed.

      When loading files from a data source as a stream, there too can be times where thousands of files are within a respective file path. This applies to both batch and stream use cases.  Note:  The Databricks "cloudFiles" AutoLoader supports these options in a stream.  

      https://docs.databricks.com/en/ingestion/auto-loader/options.html#id20

       

      Suggested Example Usages
      Start stream with all CSV files modified after date:
      spark.readStream.option("modifiedAfter","2020-06-15T05:00:00").option("quote", '"').option("escape", '"').csv(source_path)

      Start Stream with all CSV files modified before date:
      spark.readStream.option("modifiedAfter","2020-06-15T05:00:00").option("quote", '"').option("escape", '"').csv(source_path)

      Start stream with all CSV files modified between two dates:

      spark.readStream.option("modifiedAfter","2019-06-15T05:00:00").{{{}option("modifiedBefore","2020-06-15T05:00:00")option("quote", '"').option("escape", '"').csv(source_path)}}

      Attachments

        Activity

          People

            Unassigned Unassigned
            jeffsteinmetz Jeff Steinmetz
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: