[SPARK-11009] RowNumber in HiveContext returns negative values in cluster mode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.5.1
Fix Version/s: 1.5.2, 1.6.0
Component/s: Spark Core
Labels:
None
Environment:

Standalone cluster mode. No hadoop/hive is present in the environment (no hive-site.xml), only using HiveContext. Spark build as with hadoop 2.6.0. Default spark configuration variables. cluster has 4 nodes, but happens with n nodes as well.

Target Version/s:

1.5.2, 1.6.0

Description

This issue happens when submitting the job into a standalone cluster. Have not tried YARN or MESOS. Repartition df into 1 piece or default parallelism=1 does not fix the issue. Also tried having only one node in the cluster, with same result. Other shuffle configuration changes do not alter the results either.

The issue does NOT happen in --master local[*].

val ws = Window.
partitionBy("client_id").
orderBy("date")

val nm = "repeatMe"
df.select(df.col("*"), rowNumber().over(ws).as(nm))

df.filter(df("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_))

--->

Long, DateType, Int
[219483904822,2006-06-01,-1863462909]
[219483904822,2006-09-01,-1863462909]
[219483904822,2007-01-01,-1863462909]
[219483904822,2007-08-01,-1863462909]
[219483904822,2007-07-01,-1863462909]
[192489238423,2007-07-01,-1863462774]
[192489238423,2007-02-01,-1863462774]
[192489238423,2006-11-01,-1863462774]
[192489238423,2006-08-01,-1863462774]
[192489238423,2007-08-01,-1863462774]
[192489238423,2006-09-01,-1863462774]
[192489238423,2007-03-01,-1863462774]
[192489238423,2006-10-01,-1863462774]
[192489238423,2007-05-01,-1863462774]
[192489238423,2006-06-01,-1863462774]
[192489238423,2006-12-01,-1863462774]

Attachments

Issue Links

duplicates

SPARK-11481 orderBy with multiple columns in WindowSpec does not work properly

Resolved

is duplicated by

SPARK-10893 Lag Analytic function broken

Resolved

SPARK-11452 Window functions give invalid values

Resolved

is related to

SPARK-11036 AttributeReference should not be created outside driver

Resolved

links to

[Github] Pull Request #9050 (davies)

Activity

People

Assignee:: Davies Liu

Reporter:: Saif Addin Ellafi

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Oct/15 18:46

Updated:: 18/Nov/15 23:20

Resolved:: 13/Oct/15 16:48