Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41282 Feature parity: Column API in Spark Connect
  3. SPARK-41773

Window.partitionBy is not respected with row_number

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Connect
    • None

    Description

      File "/.../spark/python/pyspark/sql/connect/window.py", line 292, in pyspark.sql.connect.window.Window.orderBy
      Failed example:
          df.withColumn("row_number", row_number().over(window)).show()
      Expected:
          +---+--------+----------+
          | id|category|row_number|
          +---+--------+----------+
          |  1|       a|         1|
          |  1|       a|         2|
          |  1|       b|         3|
          |  2|       a|         1|
          |  2|       b|         2|
          |  3|       b|         1|
          +---+--------+----------+
      Got:
          +---+--------+----------+
          | id|category|row_number|
          +---+--------+----------+
          |  1|       b|         1|
          |  1|       a|         2|
          |  1|       a|         3|
          |  2|       b|         1|
          |  2|       a|         2|
          |  3|       b|         1|
          +---+--------+----------+
      

      Attachments

        Activity

          People

            podongfeng Ruifeng Zheng
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: