Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.5.1
-
None
-
Standalone cluster mode. No hadoop/hive is present in the environment (no hive-site.xml), only using HiveContext. Spark build as with hadoop 2.6.0. Default spark configuration variables. cluster has 4 nodes, but happens with n nodes as well.
Description
This issue happens when submitting the job into a standalone cluster. Have not tried YARN or MESOS. Repartition df into 1 piece or default parallelism=1 does not fix the issue. Also tried having only one node in the cluster, with same result. Other shuffle configuration changes do not alter the results either.
The issue does NOT happen in --master local[*].
val ws = Window.
partitionBy("client_id").
orderBy("date")
val nm = "repeatMe"
df.select(df.col("*"), rowNumber().over(ws).as(nm))
df.filter(df("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_))
--->
Long, DateType, Int
[219483904822,2006-06-01,-1863462909]
[219483904822,2006-09-01,-1863462909]
[219483904822,2007-01-01,-1863462909]
[219483904822,2007-08-01,-1863462909]
[219483904822,2007-07-01,-1863462909]
[192489238423,2007-07-01,-1863462774]
[192489238423,2007-02-01,-1863462774]
[192489238423,2006-11-01,-1863462774]
[192489238423,2006-08-01,-1863462774]
[192489238423,2007-08-01,-1863462774]
[192489238423,2006-09-01,-1863462774]
[192489238423,2007-03-01,-1863462774]
[192489238423,2006-10-01,-1863462774]
[192489238423,2007-05-01,-1863462774]
[192489238423,2006-06-01,-1863462774]
[192489238423,2006-12-01,-1863462774]
Attachments
Issue Links
- duplicates
-
SPARK-11481 orderBy with multiple columns in WindowSpec does not work properly
- Resolved
- is duplicated by
-
SPARK-10893 Lag Analytic function broken
- Resolved
-
SPARK-11452 Window functions give invalid values
- Resolved
- is related to
-
SPARK-11036 AttributeReference should not be created outside driver
- Resolved
- links to