Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12754

Data type mismatch on two array<bigint> values when using filter/where

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Cannot Reproduce
    • 1.5.0, 1.6.0
    • None
    • SQL
    • None
    • OSX 10.11.1, Scala 2.11.7, Spark 1.5.0+

    Description

      The following test produces the error org.apache.spark.sql.AnalysisException: cannot resolve '(point = array(0,9))' due to data type mismatch: differing types in '(point = array(0,9))' (array<bigint> and array<bigint>)

      This is not the case on 1.4.x, but has been introduced with 1.5+. Is there a preferred method for making this sort of arbitrarily sized array comparison?

      test.scala
      test("test array comparison") {
      
          val vectors: Vector[Row] =  Vector(
            Row.fromTuple("id_1" -> Array(0L, 2L)),
            Row.fromTuple("id_2" -> Array(0L, 5L)),
            Row.fromTuple("id_3" -> Array(0L, 9L)),
            Row.fromTuple("id_4" -> Array(1L, 0L)),
            Row.fromTuple("id_5" -> Array(1L, 8L)),
            Row.fromTuple("id_6" -> Array(2L, 4L)),
            Row.fromTuple("id_7" -> Array(5L, 6L)),
            Row.fromTuple("id_8" -> Array(6L, 2L)),
            Row.fromTuple("id_9" -> Array(7L, 0L))
          )
          val data: RDD[Row] = sc.parallelize(vectors, 3)
      
          val schema = StructType(
            StructField("id", StringType, false) ::
              StructField("point", DataTypes.createArrayType(LongType), false) ::
              Nil
          )
      
          val sqlContext = new SQLContext(sc)
          var dataframe = sqlContext.createDataFrame(data, schema)
      
          val  targetPoint:Array[Long] = Array(0L,9L)
      
          //This is the line where it fails
          //org.apache.spark.sql.AnalysisException: cannot resolve 
          // '(point = array(0,9))' due to data type mismatch:
          // differing types in '(point = array(0,9))' 
          // (array<bigint> and array<bigint>).
      
          val targetRow = dataframe.where(dataframe("point") === array(targetPoint.map(value => lit(value)): _*)).first()
      
          assert(targetRow != null)
        }
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            jesse.english Jesse English
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: