Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25801

pandas_udf grouped_map fails with input dataframe with more than 255 columns

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.4.0
    • Component/s: PySpark
    • Labels:
      None
    • Environment:

      python 2.7

      pyspark 2.3.0

      Description

      Hi,

      I'm using a pandas_udf to deploy a model to predict all samples in a spark dataframe,

      for this I use a udf as follows:
      @pandas_udf("scores double", PandasUDFType.GROUPED_MAP) def predict_scores(pdf): score_values = model.predict_proba(pdf)[:,1] return pd.DataFrame(

      {'scores': score_values}

      )
      So it takes a dataframe and predicts the probability of being positive according to an sklearn model for each row and returns this as single column. This works great on a random groupBy, e.g.:
      sdf_to_score.groupBy(sf.col('age')).apply(predict_scores)
      as long as the dataframe has <255 columns. When the input dataframe has more than 255 columns (thus features in my model), I get:
      org.apache.spark.api.python.PythonException: Traceback (most recent call last):
      File "path/to/lib/spark2/python/lib/pyspark.zip/pyspark/worker.py", line 219, in main
      func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
      File "path/to/lib/spark2/python/lib/pyspark.zip/pyspark/worker.py", line 148, in read_udfs
      mapper = eval(mapper_str, udfs)
      File "<string>", line 1
      SyntaxError: more than 255 arguments
      Which seems to be related with Python's general limitation of having not allowing more than 255 arguments for a function?

       

      Is this a bug or is there a straightforward way around this problem?

       

      Regards,

      Frederik

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Toekan Frederik
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: