Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22013

Allow to read the results of a streaming query as non-streaming datasource

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Resolved
    • Minor
    • Resolution: Incomplete
    • 2.2.0
    • None
    • SQL
    • All

    Description

      It would be great to have ability to read the results of a streaming query as non-streaming datasource, i.e. skipping reading _spark_metadata, because in some use-cases datasource is being modified by external tools (for example - combining small Parquet/ORC files with Hadoop rather than Spark) leaving _spark_metadata outdated. This in turn can cause errors if metadata refers to files being deleted or moved.

      Currently there is no way to override this behavior.

      Attachments

        Activity

          People

            Unassigned Unassigned
            isharamet Ivan Sharamet
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: