Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.1.0
-
None
Description
"compact" operation in FileStreamSourceLog and FileStreamSinkLog is introduced to solve "small files" problem, but introduced non-trivial latency which is another headache in long run query.
There're bunch of reports from community for the same issue (see SPARK-24295, SPARK-29995, SPARK-30462) - before trying to solve the problem, it would be better to measure the latency (elapsed time) and log to help indicating the issue when the additional latency becomes concerns.
Attachments
Issue Links
- links to