Details
Description
If you combine a UNION ALL with a count(distinct colname) you get a query analyzer bug.
This behaviour is introduced in 3.3.0. The bug was not present in 3.2.1.
Here is a reprex in PySpark:
df_pd = pd.DataFrame([
{'surname': 'a', 'first_name': 'b'}
])
df_spark = spark.createDataFrame(df_pd)
df_spark.createOrReplaceTempView("input_table")
sql = """
SELECT
(SELECT Count(DISTINCT first_name) FROM input_table)
AS distinct_value_count
FROM input_table
UNION ALL
SELECT
(SELECT Count(DISTINCT surname) FROM input_table)
AS distinct_value_count
FROM input_table """
spark.sql(sql).toPandas()