Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41991

Interpreted mode subexpression elimination can throw exception during insert

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.1, 3.4.0
    • 3.4.0
    • SQL
    • None

    Description

      Example:

      drop table if exists tbl1;
      create table tbl1 (a int, b int) using parquet;
      
      set spark.sql.codegen.wholeStage=false;
      set spark.sql.codegen.factoryMode=NO_CODEGEN;
      
      insert into tbl1
      select id as a, id as b
      from range(1, 5);
      

      This results in the following exception:

      java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.ExpressionProxy cannot be cast to org.apache.spark.sql.catalyst.expressions.Cast
      	at org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2514)
      	at org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2512)
      

      The query produces 2 bigint values, but the table's schema expects 2 int values, so Spark wraps each output field with a Cast.

      Later, in InterpretedUnsafeProjection, prepareExpressions tries to wrap the two Cast expressions with an ExpressionProxy. However, the parent expression of each Cast is a CheckOverflowInTableInsert expression, which does not accept ExpressionProxy as a child.

      Attachments

        Activity

          People

            bersprockets Bruce Robbins
            bersprockets Bruce Robbins
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: