Description
When using DataFrame backed by Hive table groupBy with agg can't resolve column if I pass them by String and not Column:
This fails with: org.apache.spark.sql.AnalysisException: expression 'dt' is neither present in the group by, nor is it an aggregate function.
val grouped = eventLogHLL .groupBy(dt, ad_id, site_id).agg( dt, ad_id, col(site_id) as site_id, sum(imp_count) as imp_count, sum(click_count) as click_count )
This works fine:
val grouped = eventLogHLL .groupBy(col(dt), col(ad_id), col(site_id)).agg( col(dt) as dt, col(ad_id) as ad_id, col(site_id) as site_id, sum(imp_count) as imp_count, sum(click_count) as click_count )
Integration tests running with "embedded" spark and DataFrames generated from RDD works fine.