Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.0.2, 2.1.0
    • Component/s: SparkR
    • Labels:
      None

      Description

      This issue is a more specific version of SPARK-17762.
      Supporting larger than 2GB arguments is more general and arguably harder to do because the limit exists both in R and JVM (because we receive data as a ByteArray). However, to support parallalizing R data.frames that are larger than 2GB we can do what PySpark does.

      PySpark uses files to transfer bulk data between Python and JVM. It has worked well for the large community of Spark Python users.

        Issue Links

          Activity

          Hide
          falaki Hossein Falaki added a comment -

          Shivaram Venkataraman and Xiangrui Meng just double checking that in all supported SparkR deployment modes, Driver R and JVM are on the same machine?

          Show
          falaki Hossein Falaki added a comment - Shivaram Venkataraman and Xiangrui Meng just double checking that in all supported SparkR deployment modes, Driver R and JVM are on the same machine?
          Hide
          srowen Sean Owen added a comment -

          This duplicates https://issues.apache.org/jira/browse/SPARK-6235 ? or is a subset?

          Show
          srowen Sean Owen added a comment - This duplicates https://issues.apache.org/jira/browse/SPARK-6235 ? or is a subset?
          Hide
          falaki Hossein Falaki added a comment -

          Thanks for pointing it out. SPARK-6235 seems to be an umbrella ticket. This one can be a subtask of it.

          Show
          falaki Hossein Falaki added a comment - Thanks for pointing it out. SPARK-6235 seems to be an umbrella ticket. This one can be a subtask of it.
          Hide
          felixcheung Felix Cheung added a comment - - edited

          Yes. Driver R and Driver JVM should be on the same machine.
          I have not checked recently but there might be projects changing on how the Backend is connected that could be affected by this though

          Show
          felixcheung Felix Cheung added a comment - - edited Yes. Driver R and Driver JVM should be on the same machine. I have not checked recently but there might be projects changing on how the Backend is connected that could be affected by this though
          Hide
          felixcheung Felix Cheung added a comment -

          more discussion on https://issues.apache.org/jira/browse/SPARK-16578 - PR was opened but not merged.

          Show
          felixcheung Felix Cheung added a comment - more discussion on https://issues.apache.org/jira/browse/SPARK-16578 - PR was opened but not merged.
          Hide
          apachespark Apache Spark added a comment -

          User 'falaki' has created a pull request for this issue:
          https://github.com/apache/spark/pull/15375

          Show
          apachespark Apache Spark added a comment - User 'falaki' has created a pull request for this issue: https://github.com/apache/spark/pull/15375

            People

            • Assignee:
              falaki Hossein Falaki
              Reporter:
              falaki Hossein Falaki
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development