Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11452

Window functions give invalid values

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.5.0
    • Fix Version/s: 1.5.2
    • Component/s: SQL
    • Labels:
      None
    • Environment:

      yarn-client on Amazon EMR

      Description

      I'm getting incorrect results from several window functions:

      • negative values from ROW_NUMBER()
      • incorrect floats from SUM(...) OVER (...)
      • huge integers from COUNT(...) OVER (...)

      Here's an example pyspark session that shows the incorrect behaviors:

      >>> logout_acc.printSchema()
      root
       |-- account: long (nullable = true)
       |-- avatar: long (nullable = true)
       |-- timestamp: timestamp (nullable = true)
       |-- session_dur: double (nullable = true)
      >>> session_numbers = sqlContext.sql("""
      ...   SELECT account, avatar, timestamp,
      ...     COUNT(*) OVER (
      ...       PARTITION BY account ORDER BY timestamp) AS session,
      ...     ROW_NUMBER() OVER (
      ...       PARTITION BY account ORDER BY timestamp) AS row_num,
      ...     SUM(session_dur) OVER (
      ...       PARTITION BY account ORDER BY timestamp) AS playtime
      ...   FROM logout_acc
      ... """)
      >>> session_numbers.cache().registerTempTable('session_numbers')
      >>> session_numbers.show()
      +-------+------+--------------------+-------+-----------+--------------+
      |account|avatar|           timestamp|session|    row_num|      playtime|
      +-------+------+--------------------+-------+-----------+--------------+
      |   ...1|  ...1|2015-09-01 00:04:...|      1| 1261342896| 4.991264E-317|
      |   ...2|  ...2|2015-09-01 00:40:...|      1| -900190400| 4.991363E-317|
      |   ...2|  ...2|2015-09-01 00:43:...|      2| -695150400| 4.991363E-317|
      |   ...2|  ...2|2015-09-01 00:50:...|      3| -278701400| 4.991363E-317|
      |   ...2|  ...2|2015-09-01 01:25:...|      4| 1824111600| 4.991363E-317|
      |   ...2|  ...2|2015-09-01 01:28:...|      5| 1992217600| 4.991363E-317|
      |   ...2|  ...2|2015-09-01 01:31:...|      6|-2127806696| 4.991363E-317|
      |   ...2|  ...2|2015-09-03 00:47:...|      7|  574647760| 4.991363E-317|
      |   ...3|  ...3|2015-09-01 01:16:...|      1| 1269805600|4.9914617E-317|
      |   ...3|  ...3|2015-09-01 01:22:...|      2| 1621270600|4.9914617E-317|
      |   ...3|  ...3|2015-09-01 14:52:...|      3|-1269231952|4.9914617E-317|
      |   ...3|  ...3|2015-09-01 14:59:...|      4| -865949952|4.9914617E-317|
      |   ...3|  ...3|2015-09-04 05:05:...|      5| -678665344|4.9914617E-317|
      |   ...3|  ...3|2015-09-05 16:19:...|      6| 1645213072|4.9914617E-317|
      |   ...3|  ...3|2015-09-05 17:09:...|      7|  339651776|4.9914617E-317|
      |   ...3|  ...3|2015-09-05 20:27:...|      8| -669300112|4.9914617E-317|
      |   ...3|  ...3|2015-09-05 21:12:...|      9| 2032531888|4.9914617E-317|
      |   ...3|  ...3|2015-09-09 02:45:...|     10| 2046885648|4.9914617E-317|
      |   ...3|  ...3|2015-09-09 03:47:...|     11| 1493942352|4.9914617E-317|
      |   ...3|  ...3|2015-09-09 21:09:...|     12| -434263088|4.9914617E-317|
      +-------+------+--------------------+-------+-----------+--------------+
      >>> sqlContext.sql('select max(session) from session_numbers').show()
      +-------------------+                                                           
      |                _c0|
      +-------------------+
      |4687099421149855744|
      +-------------------+
      

      In fact, this was a transient problem, because after I re-ran the above commands, the results came out correct. Unfortunately I also cannot share the data to try and reproduce the problem, but I hope this report might be useful nevertheless. Thanks.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                davies Davies Liu
                Reporter:
                jseppanen Jarno Seppanen
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: