Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38227

Apply strict nullability of nested column in time window / session window

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.1, 3.3.0
    • 3.3.0
    • Structured Streaming
    • None

    Description

      In TimeWindow and SessionWindow, we define dataType of these function expressions as StructType having two nested columns "start" and "end", which is "nullable".

      And we replace these expressions in the analyzer via corresponding rules, TimeWindowing for TimeWindow, and SessionWindowing for SessionWindow.

      The rules replace the function expressions with Alias, referring CreateNamedStruct. For the value side of CreateNamedStruct, we don't specify anything about nullability, which leads to a risk the value side may be interpreted (or optimized) as non-nullable, which would make inconsistency.

      We should make sure the nullability of columns in CreateNamedStruct remains the same with dataType definition on these function expressions.

      Attachments

        Activity

          People

            kabhwan Jungtaek Lim
            kabhwan Jungtaek Lim
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: