[SPARK-6522] Standardize Random Number Generation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.3.0
Fix Version/s: 1.1.0
Component/s: Spark Core
Labels:
None

Description

Generation of random numbers in Spark has to be handled carefully since references to RNGs copy the state to the workers. As such, a separate RNG needs to be seeded for each partition. Each time random numbers are used in Spark's libraries, the RNG seeding is re-implemented, leaving open the possibility of mistakes.

It would be useful if RNG seeding was standardized through utility functions or random number generation functions that can be called in Spark pipelines.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: R J Nowling

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 25/Mar/15 03:21

Updated:: 25/Nov/16 14:48

Resolved:: 25/Nov/16 14:48