Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21837

UserDefinedTypeSuite local UDFs not actually testing what it intends

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.3.0
    • Component/s: SQL, Tests
    • Labels:
      None

      Description

      Consider this test in UserDefinedTypeSuite:

        test("Local UDTs") {
          val df = Seq((1, new UDT.MyDenseVector(Array(0.1, 1.0)))).toDF("int", "vec")
          df.collect()(0).getAs[UDT.MyDenseVector](1)
          df.take(1)(0).getAs[UDT.MyDenseVector](1)
          df.limit(1).groupBy('int).agg(first('vec)).collect()(0).getAs[UDT.MyDenseVector](0)
          df.orderBy('int).limit(1).groupBy('int).agg(first('vec)).collect()(0)
            .getAs[UDT.MyDenseVector](0)
        }
      

      I claim the last two lines can't be right, because they say that the first column in the aggregation is the vector, when it is the grouping key (int). But it passes!

      But it started failing when I made seemingly unrelated changes in https://github.com/apache/spark/pull/18645 like:

      [info] - Local UDTs *** FAILED *** (144 milliseconds)
      [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.sql.UDT$MyDenseVector
      [info]   at org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:211)
      [info]   at org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:205)
      

      I modified the test to actually assert that the vector that results in each case is the expected one, and it began failing with the same error, in master. Therefore I am pretty sure the test is not quite doing what it seems to want to, and the result of these expressions just happened to not be fully evaluated or checked.

      CC Michael Armbrust for the discussion at https://github.com/apache/spark/commit/3ae25f244bd471ef77002c703f2cc7ed6b524f11##commitcomment-23320234 and apologies if I'm still really missing something here. I'll open a PR to show you what I mean.

        Attachments

          Activity

            People

            • Assignee:
              srowen Sean Owen
              Reporter:
              srowen Sean Owen
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: