Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.0.2, 2.1.0
    • Component/s: SparkR
    • Labels:
      None

      Description

      This issue is a more specific version of SPARK-17762.
      Supporting larger than 2GB arguments is more general and arguably harder to do because the limit exists both in R and JVM (because we receive data as a ByteArray). However, to support parallalizing R data.frames that are larger than 2GB we can do what PySpark does.

      PySpark uses files to transfer bulk data between Python and JVM. It has worked well for the large community of Spark Python users.

        Attachments

          Activity

            People

            • Assignee:
              falaki Hossein Falaki
              Reporter:
              falaki Hossein Falaki
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: