Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32237

Cannot resolve column when put hint in the views of common table expression

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.1, 3.1.0
    • SQL
    • None
    • Hadoop-2.7.7
      Hive-2.3.6
      Spark-3.0.0

    Description

      Suppose we have a table:

      CREATE TABLE DEMO_DATA (
        ID VARCHAR(10),
        NAME VARCHAR(10),
        BATCH VARCHAR(10),
        TEAM VARCHAR(1)
      ) STORED AS PARQUET;
      

      and some data in it:

      0: jdbc:hive2://HOSTNAME:10000> SELECT T.* FROM DEMO_DATA T;
      +-------+---------+-------------+---------+
      | t.id  | t.name  |   t.batch   | t.team  |
      +-------+---------+-------------+---------+
      | 1     | mike    | 2020-07-08  | A       |
      | 2     | john    | 2020-07-07  | B       |
      | 3     | rose    | 2020-07-06  | B       |
      | ....                                    |
      +-------+---------+-------------+---------+
      

      If I put query hint in va or vb and run it in spark-shell:

      sql("""
      WITH VA AS
       (SELECT T.ID, T.NAME, T.BATCH, T.TEAM 
          FROM DEMO_DATA T WHERE T.TEAM = 'A'),
      VB AS
       (SELECT /*+ REPARTITION(3) */ T.ID, T.NAME, T.BATCH, T.TEAM
          FROM VA T)
      SELECT T.ID, T.NAME, T.BATCH, T.TEAM 
        FROM VB T
      """).show
      

      In Spark-2.4.4 it works fine.
      But in Spark-3.0.0, it throws AnalysisException with Unrecognized hint warning:

      20/07/09 13:51:14 WARN analysis.HintErrorLogger: Unrecognized hint: REPARTITION(3)
      org.apache.spark.sql.AnalysisException: cannot resolve '`T.ID`' given input columns: [T.BATCH, T.ID, T.NAME, T.TEAM]; line 8 pos 7;
      'Project ['T.ID, 'T.NAME, 'T.BATCH, 'T.TEAM]
      +- SubqueryAlias T
         +- SubqueryAlias VB
            +- Project [ID#0, NAME#1, BATCH#2, TEAM#3]
               +- SubqueryAlias T
                  +- SubqueryAlias VA
                     +- Project [ID#0, NAME#1, BATCH#2, TEAM#3]
                        +- Filter (TEAM#3 = A)
                           +- SubqueryAlias T
                              +- SubqueryAlias spark_catalog.default.demo_data
                                 +- Relation[ID#0,NAME#1,BATCH#2,TEAM#3] parquet
      
        at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:143)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:140)
        at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:333)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
        at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:333)
        at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:106)
        at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:118)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
        at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:118)
        at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:129)
        at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:134)
        at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at scala.collection.TraversableLike.map(TraversableLike.scala:238)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
        at scala.collection.immutable.List.map(List.scala:298)
        at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:134)
        at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:139)
        at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237)
        at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:139)
        at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:106)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:140)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:92)
        at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:177)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:92)
        at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:89)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:130)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:156)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:153)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:68)
        at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:133)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
        at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:133)
        at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:68)
        at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:66)
        at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:58)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
        ... 56 elided
      

      I think the analysis procedure should not be disturbed even though there was any hint could not be recognized.

      Attachments

        Issue Links

          Activity

            People

              cltlfcjin Lantao Jin
              kernelforce Kernel Force
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 168h
                  168h
                  Remaining:
                  Remaining Estimate - 168h
                  168h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified