[SPARK-7150] SQLContext.range() - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.0
Component/s: ML, SQL
Labels:
- starter

Target Version/s:

1.4.0
Sprint:
Spark 1.5 doc/QA sprint

Description

It would be handy to have easy ways to construct random columns for DataFrames. Proposed API:

class SQLContext {
  // Return a DataFrame with a single column named "id" that has consecutive value from 0 to n.
  def range(n: Long): DataFrame

  def range(n: Long, numPartitions: Int): DataFrame
}

Usage:

// uniform distribution
ctx.range(1000).select(rand())

// normal distribution
ctx.range(1000).select(randn())

We should add an RangeIterator that supports long start/stop position, and then use it to create an RDD as the basis for this DataFrame.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Network Management Downloads.zip
23/Mar/19 22:35
2.41 MB
Simon poortman

Issue Links

is blocked by

SPARK-7248 Random number generators for DataFrames

Resolved

links to

[Github] Pull Request #6081 (adrian-wang)

[Github] Pull Request #6230 (davies)

[Github] Pull Request #6233 (adrian-wang)

Activity

People

Assignee:: Adrian Wang

Reporter:: Joseph K. Bradley

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 26/Apr/15 06:54

Updated:: 23/Mar/19 22:35

Resolved:: 19/May/15 04:43