Details
Description
It would be handy to have easy ways to construct random columns for DataFrames. Proposed API:
class SQLContext { // Return a DataFrame with a single column named "id" that has consecutive value from 0 to n. def range(n: Long): DataFrame def range(n: Long, numPartitions: Int): DataFrame }
Usage:
// uniform distribution ctx.range(1000).select(rand()) // normal distribution ctx.range(1000).select(randn())
We should add an RangeIterator that supports long start/stop position, and then use it to create an RDD as the basis for this DataFrame.
Attachments
Attachments
Issue Links
- is blocked by
-
SPARK-7248 Random number generators for DataFrames
- Resolved
- links to