Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41279 Feature parity: DataFrame API in Spark Connect
  3. SPARK-41902

Parity in String representation of higher_order_function's output

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Connect
    • None

    Description

      from pyspark.sql.functions import flatten, struct, transform
      
      df = self.spark.sql("SELECT array(1, 2, 3) as numbers, array('a', 'b', 'c') as letters")
      
      actual = df.select(
          flatten(
              transform(
                  "numbers",
                  lambda number: transform(
                      "letters", lambda letter: struct(number.alias("n"), letter.alias("l"))
                  ),
              )
          )
      ).first()[0]
      
      expected = [
          (1, "a"),
          (1, "b"),
          (1, "c"),
          (2, "a"),
          (2, "b"),
          (2, "c"),
          (3, "a"),
          (3, "b"),
          (3, "c"),
      ]
      
      self.assertEquals(actual, expected)
          Traceback (most recent call last):
        File "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py", line 809, in test_nested_higher_order_function
          self.assertEquals(actual, expected)
      AssertionError: Lists differ: [{'n': 'a', 'l': 'a'}, {'n': 'b', 'l': 'b'[151 chars]'c'}] != [(1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), ([43 chars]'c')]
      
      First differing element 0:
      {'n': 'a', 'l': 'a'}
      (1, 'a')
      
      - [{'l': 'a', 'n': 'a'},
      -  {'l': 'b', 'n': 'b'},
      -  {'l': 'c', 'n': 'c'},
      -  {'l': 'a', 'n': 'a'},
      -  {'l': 'b', 'n': 'b'},
      -  {'l': 'c', 'n': 'c'},
      -  {'l': 'a', 'n': 'a'},
      -  {'l': 'b', 'n': 'b'},
      -  {'l': 'c', 'n': 'c'}]
      + [(1, 'a'),
      +  (1, 'b'),
      +  (1, 'c'),
      +  (2, 'a'),
      +  (2, 'b'),
      +  (2, 'c'),
      +  (3, 'a'),
      +  (3, 'b'),
      +  (3, 'c')]
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            podongfeng Ruifeng Zheng
            techaddict Sandeep Singh
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment