Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
1.5.0, 1.6.0
-
None
-
None
-
OSX 10.11.1, Scala 2.11.7, Spark 1.5.0+
Description
The following test produces the error org.apache.spark.sql.AnalysisException: cannot resolve '(point = array(0,9))' due to data type mismatch: differing types in '(point = array(0,9))' (array<bigint> and array<bigint>)
This is not the case on 1.4.x, but has been introduced with 1.5+. Is there a preferred method for making this sort of arbitrarily sized array comparison?
test.scala
test("test array comparison") { val vectors: Vector[Row] = Vector( Row.fromTuple("id_1" -> Array(0L, 2L)), Row.fromTuple("id_2" -> Array(0L, 5L)), Row.fromTuple("id_3" -> Array(0L, 9L)), Row.fromTuple("id_4" -> Array(1L, 0L)), Row.fromTuple("id_5" -> Array(1L, 8L)), Row.fromTuple("id_6" -> Array(2L, 4L)), Row.fromTuple("id_7" -> Array(5L, 6L)), Row.fromTuple("id_8" -> Array(6L, 2L)), Row.fromTuple("id_9" -> Array(7L, 0L)) ) val data: RDD[Row] = sc.parallelize(vectors, 3) val schema = StructType( StructField("id", StringType, false) :: StructField("point", DataTypes.createArrayType(LongType), false) :: Nil ) val sqlContext = new SQLContext(sc) var dataframe = sqlContext.createDataFrame(data, schema) val targetPoint:Array[Long] = Array(0L,9L) //This is the line where it fails //org.apache.spark.sql.AnalysisException: cannot resolve // '(point = array(0,9))' due to data type mismatch: // differing types in '(point = array(0,9))' // (array<bigint> and array<bigint>). val targetRow = dataframe.where(dataframe("point") === array(targetPoint.map(value => lit(value)): _*)).first() assert(targetRow != null) }