Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15418

SparkSQL does not support using a UDAF in a CREATE VIEW clause

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 1.6.1
    • Fix Version/s: None
    • Component/s: SQL

      Description

      I am using AWS EMR + Spark 1.6.1 + Hive 1.0.0

      I have this UDAF and have included it in the classpath of spark https://github.com/scribd/hive-udaf-maxrow/blob/master/src/com/scribd/hive/udaf/GenericUDAFMaxRow.java

      And registered it in spark by sqlContext.sql("CREATE TEMPORARY FUNCTION maxrow AS 'some.cool.package.hive.udf.GenericUDAFMaxRow'")

      However, when I call it in Spark in the following CREATE VIEW query

      CREATE VIEW VIEW_1 AS
            SELECT
              a.A,
              a.B,
              maxrow ( a.C,
                       a.D,
                       a.E,
                       a.F,
                       a.G,
                       a.H,
                       a.I
                  ) as m
              FROM
                  table_1 a
              JOIN
                  table_2 b
              ON
                      b.Z = a.D
                  AND b.Y  = a.C
              JOIN dummy_table
              GROUP BY
                  a.A,
                  a.B
      

      It gave me the following error

      16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.A was overwritten in RowResolver map: _col0: string by _col0: string
      16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.B was overwritten in RowResolver map: _col1: bigint by _col1: bigint
      16/05/18 19:49:14 ERROR Driver: FAILED: SemanticException [Error 10002]: Line 16:32 Invalid column reference 'C'
      org.apache.hadoop.hive.ql.parse.SemanticException: Line 16:32 Invalid column reference 'C'
                      at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:10643)
                      at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:10591)
                      at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3656)
      

      Running the query without CREATE VIEW is fine.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              hbwang Hanbo Wang
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: