Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
1.2.1
-
None
Description
The file source dstream (StreamContext.fileStream) has a properties named "newFilesOnly" to include the old files, it worked fine with 1.1.0, and broken at 1.2.1, the older files always be ignored no mattern what value is set.
Here is the simple reproduce code:
https://gist.github.com/jhu-chang/1ee5b0788c7479414eeb
The reason is that: the "modTimeIgnoreThreshold" in FileInputDStream::findNewFiles is set to a time closed to system time (Spark Streaming Clock time), so the files old than this time are ignored.
Attachments
Issue Links
- duplicates
-
SPARK-3276 Provide a API to specify MIN_REMEMBER_DURATION for files to consider as input in streaming
- Resolved