Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25816

Functions does not resolve Columns correctly

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.3.0, 2.3.1, 2.3.2, 2.4.0
    • Fix Version/s: 2.3.3, 2.4.0
    • Component/s: SQL
    • Labels:
      None

      Description

      When there is a duplicate column name in the current Dataframe and orginal Dataframe where current df is selected from, Spark in 2.3.0 and 2.3.1 does not resolve the column correctly when using it in the expression, hence causing casting issue. The same code is working in Spark 2.2.1

      Please see below code to reproduce the issue

      import org.apache.spark._
      import org.apache.spark.rdd._
      import org.apache.spark.storage.StorageLevel._
      import org.apache.spark.sql._
      import org.apache.spark.sql.DataFrame
      import org.apache.spark.sql.types._
      import org.apache.spark.sql.functions._
      import org.apache.spark.sql.catalyst.expressions._
      import org.apache.spark.sql.Column

      val v0 = spark.read.parquet("/data/home/bzinfa/bz/source.snappy.parquet")
      val v00 = v0.toDF(v0.schema.fields.indices.view.map("" + ):*)
      val v5 = v00.select($"13".as("0"),$"14".as("1"),$"15".as("2"))
      val v5_2 = $"2"
      v5.where(lit(500).<(v5_2(new Column(new MapKeys(v5_2.expr))(lit(0)))))

      //v00's 3rdcolumn is binary and 16th is map<string, double>

      Error:
      org.apache.spark.sql.AnalysisException: cannot resolve 'map_keys(`2`)' due to data type mismatch: argument 1 requires map type, however, '`2`' is of binary type.;
       
       'Project 0#1591, 1#1592, 2#1593 +- 'Filter (500 < 2#1593[map_keys(2#1561)[0]]) +- Project 13#1572 AS 0#1591, 14#1573 AS 1#1592, 15#1574 AS 2#1593, 2#1561 +- Project c_bytes#1527 AS 0#1559, c_union#1528 AS 1#1560, c_fixed#1529 AS 2#1561, c_boolean#1530 AS 3#1562, c_float#1531 AS 4#1563, c_double#1532 AS 5#1564, c_int#1533 AS 6#1565, c_long#1534L AS 7#1566L, c_string#1535 AS 8#1567, c_decimal_18_2#1536 AS 9#1568, c_decimal_28_2#1537 AS 10#1569, c_decimal_38_2#1538 AS 11#1570, c_date#1539 AS 12#1571, simple_struct#1540 AS 13#1572, simple_array#1541 AS 14#1573, simple_map#1542 AS 15#1574 +- Relationc_bytes#1527,c_union#1528,c_fixed#1529,c_boolean#1530,c_float#1531,c_double#1532,c_int#1533,c_long#1534L,c_string#1535,c_decimal_18_2#1536,c_decimal_28_2#1537,c_decimal_38_2#1538,c_date#1539,simple_struct#1540,simple_array#1541,simple_map#1542 parquet

        Attachments

        1. final_allDatatypes_Spark.avro
          0.6 kB
          Brian Zhang
        2. source.snappy.parquet
          5 kB
          Brian Zhang

          Activity

            People

            • Assignee:
              petertoth Peter Toth
              Reporter:
              bzhang Brian Zhang
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: