Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30804

Measure and log elapsed time for "compact" operation in CompactibleFileStreamLog

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.0
    • 3.0.0
    • Structured Streaming
    • None

    Description

      "compact" operation in FileStreamSourceLog and FileStreamSinkLog is introduced to solve "small files" problem, but introduced non-trivial latency which is another headache in long run query.

      There're bunch of reports from community for the same issue (see SPARK-24295SPARK-29995SPARK-30462) - before trying to solve the problem, it would be better to measure the latency (elapsed time) and log to help indicating the issue when the additional latency becomes concerns.

      Attachments

        Issue Links

          Activity

            People

              kabhwan Jungtaek Lim
              kabhwan Jungtaek Lim
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: