Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20110

Windowed aggregation do not work when the timestamp is a nested field

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.1.0
    • None
    • Input/Output

    Description

      I am loading data into a DataFrame with nested fields. I want to perform a windowed aggregation on the timestamp from a nested fields:

        .groupBy(window($"auth.sysEntryTimestamp", "2 minutes"))
      

      I get the following error:

      org.apache.spark.sql.AnalysisException: Multiple time window expressions would result in a cartesian product of rows, therefore they are not currently not supported.

      This works fine if I first extract the timestamp to a separate column:

        .withColumn("sysEntryTimestamp", $"auth.sysEntryTimestamp")
        .groupBy(
          window($"sysEntryTimestamp", "2 minutes")
        )
      

      Please see the whole sample:

      Attachments

        Activity

          People

            Unassigned Unassigned
            aseigneurin Alexis Seigneurin
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: