Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 1.4.0
    • ML, SQL
    • Spark 1.5 doc/QA sprint

    Description

      It would be handy to have easy ways to construct random columns for DataFrames. Proposed API:

      class SQLContext {
        // Return a DataFrame with a single column named "id" that has consecutive value from 0 to n.
        def range(n: Long): DataFrame
      
        def range(n: Long, numPartitions: Int): DataFrame
      }
      

      Usage:

      // uniform distribution
      ctx.range(1000).select(rand())
      
      // normal distribution
      ctx.range(1000).select(randn())
      

      We should add an RangeIterator that supports long start/stop position, and then use it to create an RDD as the basis for this DataFrame.

      Attachments

        1. Network Management Downloads.zip
          2.41 MB
          Simon poortman

        Issue Links

          Activity

            People

              adrian-wang Adrian Wang
              josephkb Joseph K. Bradley
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: