Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0
    • Component/s: ML, SQL
    • Labels:
    • Target Version/s:
    • Sprint:
      Spark 1.5 doc/QA sprint

      Description

      It would be handy to have easy ways to construct random columns for DataFrames. Proposed API:

      class SQLContext {
        // Return a DataFrame with a single column named "id" that has consecutive value from 0 to n.
        def range(n: Long): DataFrame
      
        def range(n: Long, numPartitions: Int): DataFrame
      }
      

      Usage:

      // uniform distribution
      ctx.range(1000).select(rand())
      
      // normal distribution
      ctx.range(1000).select(randn())
      

      We should add an RangeIterator that supports long start/stop position, and then use it to create an RDD as the basis for this DataFrame.

        Attachments

        1. Network Management Downloads.zip
          2.41 MB
          Simon poortman

          Issue Links

            Activity

              People

              • Assignee:
                adrian-wang Adrian Wang
                Reporter:
                josephkb Joseph K. Bradley
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: