Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-38485

Non-deterministic UDF executed multiple times when combined with withField

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.0
    • None
    • SQL

    Description

      When adding fields to a result of a non-deterministic UDF, that returns a struct, then that UDF is executed multiple times (once per field) for each row.

      In this UT df1 passes, but df2 fails with something like:
      "279751724 did not equal -1023188908"

        test("SPARK-XXXXX: non-deterministic UDF should be called once when adding fields") {
          val nondeterministicUDF = udf((s: Int) => {
            val r = Random.nextInt()
            // Both values should be the same
            GroupByKey(r, r)
          }).asNondeterministic()
      
          val df1 = spark.range(5).select(nondeterministicUDF($"id"))
          df1.collect().foreach {
            row => assert(row.getStruct(0).getInt(0) == row.getStruct(0).getInt(1))
          }
      
          val df2 = spark.range(5).select(nondeterministicUDF($"id").withField("new", lit(7)))
          df2.collect().foreach {
            row => assert(row.getStruct(0).getInt(0) == row.getStruct(0).getInt(1))
          }
        }
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            tanelk Tanel Kiis
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: