Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20534

Outer generators skip missing records if used alone

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.2.0
    • SQL
    • None
    • master 814a61a867ded965433c944c90961df529ac83ab

    Description

      Example data:

      val df = Seq(
        (1, Some("a" :: "b" :: "c" :: Nil)), 
        (2, None), 
        (3, Some("a" :: Nil)
      )).toDF("k", "vs")
      

      Correct behavior if there are other expressions:

      df.select($"k", explode_outer($"vs")).show
      // +---+----+
      // |  k| col|
      // +---+----+
      // |  1|   a|
      // |  1|   b|
      // |  1|   c|
      // |  2|null|
      // |  3|   a|
      // +---+----+
      
      
      df.select($"k", posexplode_outer($"vs")).show
      // +---+----+----+
      // |  k| pos| col|
      // +---+----+----+
      // |  1|   0|   a|
      // |  1|   1|   b|
      // |  1|   2|   c|
      // |  2|null|null|
      // |  3|   0|   a|
      // +---+----+----+
      

      Incorrect behavior if used alone:

      df.select(explode_outer($"vs")).show
      // +---+
      // |col|
      // +---+
      // |  a|
      // |  b|
      // |  c|
      // |  a|
      // +---+
      
      
      df.select(posexplode_outer($"vs")).show
      // +---+---+
      // |pos|col|
      // +---+---+
      // |  0|  a|
      // |  1|  b|
      // |  2|  c|
      // |  0|  a|
      // +---+---+
      

      Attachments

        Activity

          People

            zero323 Maciej Szymkiewicz
            zero323 Maciej Szymkiewicz
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: