Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10047

Improve the implementation of collect() on DataFrame in SparkR

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Implemented
    • None
    • None
    • SparkR
    • None

    Description

      currently in SparkR, collect() on a DataFrame collects the data within the DataFrame into a local data.frame. R users are used to using data.frame.

      However, collect() currently can't collect data of nested types from a DataFrame because:
      1. The serializer in JVM backend does not support nested types;
      2. collect() in R side assumes each column is of simple atomic type that can be combinded into a atomic vector

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sunrui Sun Rui
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: