Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39292

Make Dataset.melt work with struct fields

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • None
    • SQL
    • None

    Description

      In SPARK-38864, the melt function was added to Dataset.

      It would be nice if fields of struct fields could be used as id and value columns. This would allow for the following:

      Given a Dataset with following schema:

      root
       |-- an: struct (nullable = false)
       |    |-- id: integer (nullable = false)
       |-- str: struct (nullable = false)
       |    |-- one: string (nullable = true)
       |    |-- two: string (nullable = true)
      

      For example:

      +---+-------------+
      | an|          str|
      +---+-------------+
      |{1}|   {one, One}|
      |{2}|  {two, null}|
      |{3}|{null, three}|
      |{4}| {null, null}|
      +---+-------------+
      

      Melting with value columns Seq("str.one", "str.two") on id columns Seq("an.id") would result in

      +--+--------+-----+
      |an|variable|value|
      +--+--------+-----+
      | 1| str.one|  one|
      | 1| str.two|  One|
      | 2| str.one|  two|
      | 2| str.two| null|
      | 3| str.one| null|
      | 3| str.two|three|
      | 4| str.one| null|
      | 4| str.two| null|
      +--+--------+-----+
      

      See test in org.apache.spark.sql.MeltSuite:

        test("SPARK-39292: melt with struct fields") {
          val df = meltWideDataDs.select(
            struct($"id").as("an"),
            struct(
              $"str1".as("one"),
              $"str2".as("two")
            ).as("str")
          )
      
          checkAnswer(
            Melt.of(df, Seq("an.id"), Seq("str.one", "str.two"), false, "variable", "value"),
            meltedWideDataRows.map(row => Row(
              row.getInt(0),
              row.getString(1) match {
                case "str1" => "str.one"
                case "str2" => "str.two"
              },
              row.getString(2)
            ))
          )
        }
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              enricomi Enrico Minack
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: