Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.1.0
-
None
Description
I am loading data into a DataFrame with nested fields. I want to perform a windowed aggregation on the timestamp from a nested fields:
.groupBy(window($"auth.sysEntryTimestamp", "2 minutes"))
I get the following error:
org.apache.spark.sql.AnalysisException: Multiple time window expressions would result in a cartesian product of rows, therefore they are not currently not supported.
This works fine if I first extract the timestamp to a separate column:
.withColumn("sysEntryTimestamp", $"auth.sysEntryTimestamp") .groupBy( window($"sysEntryTimestamp", "2 minutes") )
Please see the whole sample:
- batch: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4683710270868386/4278399007363210/3769253384867782/latest.html
- Structured Streaming: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4683710270868386/4278399007363192/3769253384867782/latest.html