Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8277

SparkR createDataFrame is slow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.4.0
    • 1.6.0
    • SparkR
    • None

    Description

      For example calling `createDataFrame` on the data from http://s3-us-west-2.amazonaws.com/sparkr-data/flights.csv takes a really long time

      This is mainly because we try to convert a DataFrame to a List in order to parallelize it by rows and the conversion from DF to list is very slow for large data frames.

      Attachments

        Activity

          People

            zero323 Maciej Szymkiewicz
            shivaram Shivaram Venkataraman
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: