Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17093

Roundtrip encoding of array<struct<>> fields is wrong when whole-stage codegen is disabled

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.0.0
    • 2.0.1, 2.1.0
    • SQL

    Description

      The following failing test demonstrates a bug where Spark mis-encodes array-of-struct fields if whole-stage codegen is disabled:

      withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false") {
        val data = Array(Array((1, 2), (3, 4)))
        val ds = spark.sparkContext.parallelize(data).toDS()
        assert(ds.collect() === data)
      }
      

      When wholestage codegen is enabled (the default), this works fine. When it's disabled, as in the test above, Spark returns Array(Array((3,4), (3,4))). Because the last element of the array appears to be repeated my best guess is that the interpreted evaluation codepath forgot to copy() somewhere.

      Attachments

        Issue Links

          Activity

            People

              proflin Liwei Lin(Inactive)
              joshrosen Josh Rosen
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: