Uploaded image for project: 'SystemDS'
  1. SystemDS
  2. SYSTEMDS-1370

Py4JError: An error occurred while calling z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Not Applicable
    • SystemML 0.14
    • APIs
    • None
    • pyspark with local Spark 2.1

    Description

      Do we have undocumented limits for RDDConverterUtilsExt.convertPy4JArrayToMB?

      Below simple script works for 23100 rows, while 46900 fails. This is how to easily and consistently reproduce.

      START:
      $pyspark --master local --jars $SYSTEMML_HOME/SystemML.jar --driver-memory 8G --executor-memory 2G

      PYTHON SCRIPT:
      from systemml import MLContext, dml
      import pandas as pd

      sc.version
      ml = MLContext(sc)
      print "Spark Version:", sc.version
      print "SystemML Version:", ml.version()
      print "SystemML Built-Time:", ml.buildTime()

      1. !! number of rows 23100 works, while 46900 fails
        nr = 46900

      X_pd = pd.DataFrame(range(1, (nr*784)+1,1),dtype=float).values.reshape(nr,784)

      script ="""
      write(X, $Xfile, format="csv")
      """
      prog = dml(script).input(X=X_pd).input(**

      {"$Xfile":"/tmp/X_pd.csv"}

      )
      ml.execute(prog)

      OUTPUT:
      Spark Version: 2.1.0
      SystemML Version: 0.14.0-incubating-SNAPSHOT
      SystemML Built-Time: 2017-03-03 07:33:40 UTC
      ---------------------------------------------------------------------------
      Py4JError Traceback (most recent call last)
      .......

      Py4JError: An error occurred while calling z:org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.convertPy4JArrayToMB. Trace:
      java.lang.NegativeArraySizeException
      at py4j.Base64.decode(Base64.java:321)
      at py4j.Protocol.getBytes(Protocol.java:173)
      at py4j.Protocol.getObject(Protocol.java:294)
      at py4j.commands.AbstractCommand.getArguments(AbstractCommand.java:82)
      at py4j.commands.CallCommand.execute(CallCommand.java:77)
      at py4j.GatewayConnection.run(GatewayConnection.java:214)
      at java.lang.Thread.run(Thread.java:745)

      Attachments

        Activity

          People

            niketanpansare Niketan Pansare
            reinwald@us.ibm.com Berthold Reinwald
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: