Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40892

Loosen the requirement of window_time rule - allow multiple window_time calls

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Structured Streaming
    • None

    Description

      SPARK-40821 introduces a new SQL function "window_time" to extract the representative time from window (which also carries over the event time metadata as well if feasible).

      SPARK-40821 followed the existing rule of time window / session window which only allows a single function call in a same projection (strictly saying, it considers the call of function as once if the function is called with same parameters).

      For existing rules, the restriction makes sense since allowing this would produce cartesian product of rows (although Spark can handle it). But given that window_time only produces one value, the restriction no longer makes sense.

      It would be better to unlock the functionality. Note that this means the resulting column of "window_time()" is no longer be "window_time". (Note that this is the practice most of function calls do. The rules time window and session window don't follow the practice so arguably they have a bug, but fixing the bug would bring backward incompatibility...)

      Attachments

        Activity

          People

            kabhwan Jungtaek Lim
            kabhwan Jungtaek Lim
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: