Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4226

SparkSQL - Add support for subqueries in predicates

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.2.0
    • 2.0.0
    • SQL
    • None
    • Spark 1.2 snapshot

    Description

      I have a test table defined in Hive as follows:

      CREATE TABLE sparkbug (
        id INT,
        event STRING
      ) STORED AS PARQUET;
      

      and insert some sample data with ids 1, 2, 3.

      In a Spark shell, I then create a HiveContext and then execute the following HQL to test out subquery predicates:

      val hc = HiveContext(hc)
      hc.hql("select customerid from sparkbug where customerid in (select customerid from sparkbug where customerid in (2,3))")
      

      I get the following error:

      java.lang.RuntimeException: Unsupported language features in query: select customerid from sparkbug where customerid in (select customerid from sparkbug where customerid in (2,3))
      TOK_QUERY
        TOK_FROM
          TOK_TABREF
            TOK_TABNAME
              sparkbug
        TOK_INSERT
          TOK_DESTINATION
            TOK_DIR
              TOK_TMP_FILE
          TOK_SELECT
            TOK_SELEXPR
              TOK_TABLE_OR_COL
                customerid
          TOK_WHERE
            TOK_SUBQUERY_EXPR
              TOK_SUBQUERY_OP
                in
              TOK_QUERY
                TOK_FROM
                  TOK_TABREF
                    TOK_TABNAME
                      sparkbug
                TOK_INSERT
                  TOK_DESTINATION
                    TOK_DIR
                      TOK_TMP_FILE
                  TOK_SELECT
                    TOK_SELEXPR
                      TOK_TABLE_OR_COL
                        customerid
                  TOK_WHERE
                    TOK_FUNCTION
                      in
                      TOK_TABLE_OR_COL
                        customerid
                      2
                      3
              TOK_TABLE_OR_COL
                customerid
      
      scala.NotImplementedError: No parse rules for ASTNode type: 817, text: TOK_SUBQUERY_EXPR :
      TOK_SUBQUERY_EXPR
        TOK_SUBQUERY_OP
          in
        TOK_QUERY
          TOK_FROM
            TOK_TABREF
              TOK_TABNAME
                sparkbug
          TOK_INSERT
            TOK_DESTINATION
              TOK_DIR
                TOK_TMP_FILE
            TOK_SELECT
              TOK_SELEXPR
                TOK_TABLE_OR_COL
                  customerid
            TOK_WHERE
              TOK_FUNCTION
                in
                TOK_TABLE_OR_COL
                  customerid
                2
                3
        TOK_TABLE_OR_COL
          customerid
      " +
               
      org.apache.spark.sql.hive.HiveQl$.nodeToExpr(HiveQl.scala:1098)
              
              at scala.sys.package$.error(package.scala:27)
              at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:252)
              at org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:50)
              at org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:49)
              at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)
      

      This thread also brings up lack of subquery support in SparkSQL. It would be nice to have subquery predicate support in a near, future release (1.3, maybe?).

      Attachments

        Issue Links

          Activity

            People

              hvanhovell Herman van Hövell
              terry.siu Terry Siu
              Votes:
              11 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: