Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24050

StreamingQuery does not calculate input / processing rates in some cases

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.3.0
    • 2.4.0
    • Structured Streaming
    • None

    Description

      In some streaming queries, the input and processing rates are not calculated at all (shows up as zero) because MicroBatchExecution fails to associated metrics from the executed plan of a trigger with the sources in the logical plan of the trigger. The way this executed-plan-leaf-to-logical-source attribution works is as follows. With V1 sources, there was no way to identify which execution plan leaves were generated by a streaming source. So did a best-effort attempt to match logical and execution plan leaves when the number of leaves were same. In cases where the number of leaves is different, we just give up and report zero rates. An example where this may happen is as follows.

      val cachedStaticDF = someStaticDF.union(anotherStaticDF).cache()
      val streamingInputDF = ...
      
      val query = streamingInputDF.join(cachedStaticDF).writeStream....
      

      In this case, the cachedStaticDF has multiple logical leaves, but in the trigger's execution plan it only has leaf because a cached subplan is represented as a single InMemoryTableScanExec leaf. This leads to a mismatch in the number of leaves causing the input rates to be computed as zero.

      With DataSourceV2, all inputs are represented in the executed plan using DataSourceV2ScanExec, each of which has a reference to the associated logical DataSource and DataSourceReader. So its easy to associate the metrics to the original streaming sources. So the solution is to take advantage of the presence of DataSourceV2 whenever possible.

      Attachments

        Activity

          People

            tdas Tathagata Das
            tdas Tathagata Das
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: