Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.2.1, 3.5.1
-
None
-
None
-
Tested using PySpark with Spark 3.2.1 and 3.5.1 using Python 3.11 and using Java Spark using Java 11 with Spark 3.2.1. Exceptions occur both on Unix and Windows (11)
Description
Spark 3.2.1 throws a `java.lang.OutOfMemoryError: Java heap space` exception and Spark 3.5.1 seems to run into an infinite loop (but not cause an OOM!) when a query is fired that uses multiple times the same column under different aliases when combined with a JOIN and Common Table Expressions.
We have a service in production that generates SQL that operates per "KPI" and another component that combines the relevant parquet datasets and combines them. I have extraced from the logs a Minimal Reproducible Example. Weirdly, it seems related to both the JOIN and the column names used in the queries. Renaming the columns or replacing the JOIN with a UNION does not give me the error.
The setup is as follows:
df = spark.createDataFrame([(0.374, -28.039, True, True)], ['mtdcorrected_overlay_x', 'mtdcorrected_overlay_y', 'target_valid_x', 'target_valid_y']) df.createOrReplaceTempView("mtd1")df2 = spark.createDataFrame([(4.0, 0.0, True, True)], ['mtdcorrected_overlay_x', 'mtdcorrected_overlay_y', 'target_valid_x', 'target_valid_y']) df2.createOrReplaceTempView("mtd2")
Next, we JOIN the two datasets:
df3 = spark.sql(r""" SELECT `target_valid_y` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `mtdcorrected_overlay_x` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `target_valid_x` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` FROM mtd1 """) df4 = spark.sql(r""" SELECT `target_valid_y` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `target_valid_y` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `mtdcorrected_overlay_x` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_x` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `mtdcorrected_overlay_y` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `target_valid_x` `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `target_valid_x` `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` FROM mtd2 """) df3.join(df4, [ "COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base", "COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base", "COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base", "COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base", "COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base", "COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid", "COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid", "COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid", "COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid", "COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid", "COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base", "COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base", "COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base", "COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base", "COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base", "COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid", "COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid", "COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid", "COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid", "COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid" ], "fullouter").distinct().createOrReplaceTempView("TEMP_VIEW")
And finally, we fire the query:
spark.sql(r""" WITH kpiNullFilterView AS (WITH validityColumnsView0 AS (SELECT `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base`, `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` FROM `TEMP_VIEW` WHERE ( `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL OR `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL) AND ( `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL AND `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE OR `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL AND `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL AND `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL AND `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL AND `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL AND `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL AND `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL AND `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` IS NOT NULL AND `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` IS NOT NULL AND `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE)) SELECT MIN(CASE WHEN `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE THEN `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` ELSE NULL END) `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, MIN(CASE WHEN `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE THEN `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` ELSE NULL END) `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, PERCENTILE(CASE WHEN `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE THEN `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` ELSE NULL END, 2.5E-1) `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, PERCENTILE(CASE WHEN `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE THEN `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` ELSE NULL END, 2.5E-1) `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, PERCENTILE(CASE WHEN `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE THEN `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` ELSE NULL END, 5E-1) `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, PERCENTILE(CASE WHEN `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE THEN `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` ELSE NULL END, 5E-1) `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, PERCENTILE(CASE WHEN `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE THEN `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` ELSE NULL END, 7.5E-1) `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, PERCENTILE(CASE WHEN `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE THEN `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` ELSE NULL END, 7.5E-1) `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, MAX(CASE WHEN `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_valid` <> FALSE THEN `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY_base` ELSE NULL END) `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, MAX(CASE WHEN `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_valid` <> FALSE THEN `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY_base` ELSE NULL END) `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` FROM validityColumnsView0) SELECT `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` FROM kpiNullFilterView WHERE `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL""").show(1,False, True)
If you analyze the JOIN and statistics query, you notice that it actually only uses 4 distinct column and a bunch of aliases. If you replace all aliases with the original column, you get essentially the same query but this one returns nearly instantly with no issues whatsoever:
(Update the logic above to not use aliases, but still use TEMP_VIEW as name)
spark.sql(r""" WITH validityColumnsView0 AS (SELECT `mtdcorrected_overlay_x`, `mtdcorrected_overlay_y`, `target_valid_x`, `target_valid_y` FROM `TEMP_VIEW` WHERE ( `mtdcorrected_overlay_x` IS NOT NULL OR `mtdcorrected_overlay_y` IS NOT NULL) AND ( `mtdcorrected_overlay_x` IS NOT NULL AND `target_valid_x` <> FALSE OR `mtdcorrected_overlay_y` IS NOT NULL AND `target_valid_y` <> FALSE)), kpiNullFilterView AS (SELECT MIN(CASE WHEN `target_valid_x` <> FALSE THEN `mtdcorrected_overlay_x` END) `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, MIN(CASE WHEN `target_valid_y` <> FALSE THEN `mtdcorrected_overlay_y` END) `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, PERCENTILE(CASE WHEN `target_valid_x` <> FALSE THEN `mtdcorrected_overlay_x` END, 2.5E-1) `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, PERCENTILE(CASE WHEN `target_valid_y` <> FALSE THEN `mtdcorrected_overlay_y` END, 2.5E-1) `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, PERCENTILE(CASE WHEN `target_valid_x` <> FALSE THEN `mtdcorrected_overlay_x` END, 5E-1) `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, PERCENTILE(CASE WHEN `target_valid_y` <> FALSE THEN `mtdcorrected_overlay_y` END, 5E-1) `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, PERCENTILE(CASE WHEN `target_valid_x` <> FALSE THEN `mtdcorrected_overlay_x` END, 7.5E-1) `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, PERCENTILE(CASE WHEN `target_valid_y` <> FALSE THEN `mtdcorrected_overlay_y` END, 7.5E-1) `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, MAX(CASE WHEN `target_valid_x` <> FALSE THEN `mtdcorrected_overlay_x` END) `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, MAX(CASE WHEN `target_valid_y` <> FALSE THEN `mtdcorrected_overlay_y` END) `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` FROM validityColumnsView0) SELECT `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY`, `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` FROM kpiNullFilterView WHERE `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_minMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile25MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile50MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_percentile75MTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojXVALIDS_ONLY` IS NOT NULL OR `COMMON_maxMTDCorrectednXGxftqncvY2MvXSHdrojYVALIDS_ONLY` IS NOT NULL""").show(1,False, True)
Even with the 2x a 1-row dataset I get the OOM on Spark 3.2.1. The real production dataset is significant bigger, so if needed, create 100k-1M row datasets as needed. I hope this is sufficient to reproduce the issue. If anything else more is needed, leave a reply.