Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-1370

Py4JError: An error occurred while calling z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Not Applicable
    • Fix Version/s: SystemML 0.14
    • Component/s: APIs
    • Labels:
      None
    • Environment:
      pyspark with local Spark 2.1

      Description

      Do we have undocumented limits for RDDConverterUtilsExt.convertPy4JArrayToMB?

      Below simple script works for 23100 rows, while 46900 fails. This is how to easily and consistently reproduce.

      START:
      $pyspark --master local --jars $SYSTEMML_HOME/SystemML.jar --driver-memory 8G --executor-memory 2G

      PYTHON SCRIPT:
      from systemml import MLContext, dml
      import pandas as pd

      sc.version
      ml = MLContext(sc)
      print "Spark Version:", sc.version
      print "SystemML Version:", ml.version()
      print "SystemML Built-Time:", ml.buildTime()

      1. !! number of rows 23100 works, while 46900 fails
        nr = 46900

      X_pd = pd.DataFrame(range(1, (nr*784)+1,1),dtype=float).values.reshape(nr,784)

      script ="""
      write(X, $Xfile, format="csv")
      """
      prog = dml(script).input(X=X_pd).input(**

      {"$Xfile":"/tmp/X_pd.csv"}

      )
      ml.execute(prog)

      OUTPUT:
      Spark Version: 2.1.0
      SystemML Version: 0.14.0-incubating-SNAPSHOT
      SystemML Built-Time: 2017-03-03 07:33:40 UTC
      ---------------------------------------------------------------------------
      Py4JError Traceback (most recent call last)
      .......

      Py4JError: An error occurred while calling z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB. Trace:
      java.lang.NegativeArraySizeException
      at py4j.Base64.decode(Base64.java:321)
      at py4j.Protocol.getBytes(Protocol.java:173)
      at py4j.Protocol.getObject(Protocol.java:294)
      at py4j.commands.AbstractCommand.getArguments(AbstractCommand.java:82)
      at py4j.commands.CallCommand.execute(CallCommand.java:77)
      at py4j.GatewayConnection.run(GatewayConnection.java:214)
      at java.lang.Thread.run(Thread.java:745)

        Attachments

          Activity

            People

            • Assignee:
              niketanpansare Niketan Pansare
              Reporter:
              reinwald@us.ibm.com Berthold Reinwald
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: