Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18960

Avoid double reading file which is being copied.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.0.2
    • 2.2.0
    • SQL, Structured Streaming
    • None

    Description

      In HDFS, when we copy a file into target directory, there will a temporary .COPY file for a period of time. The duration depends on file size. If we do not skip this file, we will may read the same data for two times.

      Attachments

        Activity

          People

            uncleGen Genmao Yu
            uncleGen Genmao Yu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: