[SPARK-39292] Make Dataset.melt work with struct fields - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

In ~~SPARK-38864~~, the melt function was added to Dataset.

It would be nice if fields of struct fields could be used as id and value columns. This would allow for the following:

Given a Dataset with following schema:

root
 |-- an: struct (nullable = false)
 |    |-- id: integer (nullable = false)
 |-- str: struct (nullable = false)
 |    |-- one: string (nullable = true)
 |    |-- two: string (nullable = true)

For example:

+---+-------------+
| an|          str|
+---+-------------+
|{1}|   {one, One}|
|{2}|  {two, null}|
|{3}|{null, three}|
|{4}| {null, null}|
+---+-------------+

Melting with value columns Seq("str.one", "str.two") on id columns Seq("an.id") would result in

+--+--------+-----+
|an|variable|value|
+--+--------+-----+
| 1| str.one|  one|
| 1| str.two|  One|
| 2| str.one|  two|
| 2| str.two| null|
| 3| str.one| null|
| 3| str.two|three|
| 4| str.one| null|
| 4| str.two| null|
+--+--------+-----+

See test in org.apache.spark.sql.MeltSuite:

  test("SPARK-39292: melt with struct fields") {
    val df = meltWideDataDs.select(
      struct($"id").as("an"),
      struct(
        $"str1".as("one"),
        $"str2".as("two")
      ).as("str")
    )

    checkAnswer(
      Melt.of(df, Seq("an.id"), Seq("str.one", "str.two"), false, "variable", "value"),
      meltedWideDataRows.map(row => Row(
        row.getInt(0),
        row.getString(1) match {
          case "str1" => "str.one"
          case "str2" => "str.two"
        },
        row.getString(2)
      ))
    )
  }

Attachments

Issue Links

is fixed by

SPARK-38864 Unpivot / melt function for Dataset API

Resolved

links to

Pull Request #36150

Activity

People

Assignee:: Unassigned

Reporter:: Enrico Minack

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 25/May/22 14:26

Updated:: 16/Jul/22 20:26

Resolved:: 16/Jul/22 20:26