Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45655

current_date() not supported in Streaming Query Observed metrics

    XMLWordPrintableJSON

Details

    Description

      Streaming queries do not support current_date() inside CollectMetrics. The primary reason is that current_date() (resolves to CurrentBatchTimestamp) is marked as non-deterministic. However, current_date and current_timestamp are both deterministic today, and current_batch_timestamp should be the same.

       

      As an example, the query below fails due to observe call on the DataFrame.

       

      val inputData = MemoryStream[Timestamp]

      inputData.toDF()
            .filter("value < current_date()")
            .observe("metrics", count(expr("value >= current_date()")).alias("dropped"))
            .writeStream
            .queryName("ts_metrics_test")
            .format("memory")
            .outputMode("append")
            .start()

       

      Attachments

        Issue Links

          Activity

            People

              bhuwan.sahni Bhuwan Sahni
              bhuwan.sahni Bhuwan Sahni
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 48h
                  48h
                  Remaining:
                  Remaining Estimate - 48h
                  48h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified