Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8645

Incorrect expression analysis with Hive

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 1.3.0
    • None
    • SQL
    • CDH 5.4.2 1.3.0

    Description

      When using DataFrame backed by Hive table groupBy with agg can't resolve column if I pass them by String and not Column:

      This fails with: org.apache.spark.sql.AnalysisException: expression 'dt' is neither present in the group by, nor is it an aggregate function.

      val grouped = eventLogHLL
            .groupBy(dt, ad_id, site_id).agg(
              dt,
              ad_id,
              col(site_id)             as site_id,
              sum(imp_count)           as imp_count,
              sum(click_count)         as click_count
            )
      

      This works fine:

        val grouped = eventLogHLL
            .groupBy(col(dt), col(ad_id), col(site_id)).agg(
              col(dt)                        as dt,
              col(ad_id)                     as ad_id,
              col(site_id)                   as site_id,
              sum(imp_count)                 as imp_count,
              sum(click_count)               as click_count
            )
      

      Integration tests running with "embedded" spark and DataFrames generated from RDD works fine.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ezhulenev Eugene Zhulenev
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: