Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.6.0, 2.0.0, 2.0.1
Description
When utilising monotonicallyIncreasingId with a coalesce, it appears that every partition uses the same offset (0) leading to non-monotonically increasing IDs.
See examples below
>>> sqlContext.range(10).select(monotonicallyIncreasingId()).show() +---------------------------+ |monotonicallyincreasingid()| +---------------------------+ | 25769803776| | 51539607552| | 77309411328| | 103079215104| | 128849018880| | 163208757248| | 188978561024| | 214748364800| | 240518168576| | 266287972352| +---------------------------+ >>> sqlContext.range(10).select(monotonicallyIncreasingId()).coalesce(1).show() +---------------------------+ |monotonicallyincreasingid()| +---------------------------+ | 0| | 0| | 0| | 0| | 0| | 0| | 0| | 0| | 0| | 0| +---------------------------+ >>> sqlContext.range(10).repartition(5).select(monotonicallyIncreasingId()).coalesce(1).show() +---------------------------+ |monotonicallyincreasingid()| +---------------------------+ | 0| | 1| | 0| | 0| | 1| | 2| | 3| | 0| | 1| | 2| +---------------------------+
Attachments
Issue Links
- is related to
-
SPARK-14241 Output of monotonically_increasing_id lacks stable relation with rows of DataFrame
- Resolved
- links to