Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32131

Fix AnalysisException messages at UNION/INTERSECT/EXCEPT/MINUS operations

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.1.3, 2.2.3, 2.3.4, 2.4.6, 3.0.0
    • 2.4.7, 3.0.1, 3.1.0
    • SQL
    • None

    Description

      Union and set operations can only be performed on tables with the compatible column types,while when we have more than two column, the warning messages will have wrong column index.Steps to reproduce.

      Step1:prepare test data

      drop table if exists test1; 
      drop table if exists test2; 
      drop table if exists test3;
      create table if not exists test1(id int, age int, name timestamp);
      create table if not exists test2(id int, age timestamp, name timestamp);
      create table if not exists test3(id int, age int, name int);
      insert into test1 select 1,2,'2020-01-01 01:01:01';
      insert into test2 select 1,'2020-01-01 01:01:01','2020-01-01 01:01:01'; 
      insert into test3 select 1,3,4;
      

      Step2:do query:

      Query1:
      select * from test1 except select * from test2;
      Result1:
      Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. timestamp <> int at the second column of the second table;; 'Except false :- Project [id#620, age#621, name#622] : +- SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#620, age#621, name#622] +- Project [id#623, age#624, name#625] +- SubqueryAlias `default`.`test2` +- HiveTableRelation `default`.`test2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#623, age#624, name#625] (state=,code=0)
      Query2:
      select * from test1 except select * from test3;
      Result2:
      Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. int <> timestamp at the 2th column of the second table;; 'Except false :- Project [id#632, age#633, name#634] : +- SubqueryAlias `default`.`test1` : +- HiveTableRelation `default`.`test1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#632, age#633, name#634] +- Project [id#635, age#636, name#637] +- SubqueryAlias `default`.`test3` +- HiveTableRelation `default`.`test3`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#635, age#636, name#637] (state=,code=0)
      

      the result of query1 is correct, while query2 have the wrong errors,it should be the third column

      Here has the wrong column index.

      Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. int <> timestamp at the 2th column of the second table

      We may need to change to the following

      Error: org.apache.spark.sql.AnalysisException: Except can only be performed on tables with the compatible column types. int <> timestamp at the third column of the second table

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            小郭飞飞刀 philipse
            小郭飞飞刀 philipse
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment