Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18489

Implicit type conversion during comparision between Integer type column and String type column

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • SQL
    • None

    Description

      Suppose I have a dataframe with schema:

      root
       |-- _c0: integer (nullable = true)
       |-- _c1: double (nullable = true)
       |-- _c2: string (nullable = true)
      

      and data:

      +---+---+----+
      |_c0|_c1| _c2|
      +---+---+----+
      |  1|1.0|   1|
      |  2|1.0|   s|
      |  3|3.1|null|
      +---+---+----+
      

      if the following operations are carried out:

      df.where("_c1==_c2").show
      +---+---+---+
      |_c0|_c1|_c2|
      +---+---+---+
      |  1|1.0|  1|
      +---+---+---+
      
      df.where("_c1<>_c2").show   or   df.where("_c1!=_c2").show 
      +---+---+---+
      |_c0|_c1|_c2|
      +---+---+---+
      +---+---+---+
      

      So the related operation results are ambiguous
      Here the stringified numeric values are being Implicitly casted where the others are just ignored instead of throwing an exception
      In my view these things can lead to incorrect results if dataset is not properly observed.

      Also SQL-99 standard discourages implicit casting to avoid such things.
      https://users.dcc.uchile.cl/~cgutierr/cursos/BD/standards.pdf

      The same implicit casting is also there for UDFs and aggregation functions.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dasbipulkumar Bipul Kumar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: