Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0
    • Component/s: ML, SQL
    • Labels:
    • Target Version/s:
    • Sprint:
      Spark 1.5 doc/QA sprint

      Description

      It would be handy to have easy ways to construct random columns for DataFrames. Proposed API:

      class SQLContext {
        // Return a DataFrame with a single column named "id" that has consecutive value from 0 to n.
        def range(n: Long): DataFrame
      
        def range(n: Long, numPartitions: Int): DataFrame
      }
      

      Usage:

      // uniform distribution
      ctx.range(1000).select(rand())
      
      // normal distribution
      ctx.range(1000).select(randn())
      

      We should add an RangeIterator that supports long start/stop position, and then use it to create an RDD as the basis for this DataFrame.

        Attachments

          Activity

            People

            • Assignee:
              adrian-wang Adrian Wang
              Reporter:
              josephkb Joseph K. Bradley

              Dates

              • Created:
                Updated:
                Resolved:

                Agile

                  Issue deployment