Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26067

Pandas GROUPED_MAP udf breaks if DF has >255 columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 2.3.2, 2.4.0
    • None
    • PySpark
    • None

    Description

      When I run spark's Pandas GROUPED_MAP udfs to apply a UDAF i wrote in pythohn/pandas on a grouped dataframe in spark - it fails if the number of columns is greater than 255 in Pytohn 3.6 and lower.

      import pyspark
      from pyspark.sql import types as T, functions as F
      
      spark = pyspark.sql.SparkSession.builder.getOrCreate()
      df = spark.createDataFrame(
          [[i for i in range(256)], [i+1 for i in range(256)]], schema=["a" + str(i) for i in range(256)])
      
      new_schema = T.StructType([
          field for field in df.schema] + [T.StructField("new_row", T.DoubleType())])
      
      def myfunc(df):
          df['new_row'] = 1
          return df
      
      myfunc_udf = F.pandas_udf(new_schema, F.PandasUDFType.GROUPED_MAP)(myfunc)
      
      df2 = df.groupBy(["a1"]).apply(myfunc_udf)
      
      print(df2.count())  # This FAILS
      # ERROR:
      # Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
      #   File "/usr/local/hadoop/spark2.3.1/python/lib/pyspark.zip/pyspark/worker.py", line 219, in main
      #     func, profiler, deserializer, serializer = read_udfs(pickleSer, infile, eval_type)
      #   File "/usr/local/hadoop/spark2.3.1/python/lib/pyspark.zip/pyspark/worker.py", line 148, in read_udfs
      #     mapper = eval(mapper_str, udfs)
      #   File "<string>", line 1
      # SyntaxError: more than 255 arguments
      

      Note: In Python 3.7 the 255 limit was raised, but I have not tried with Pytohn 3.7 ...https://docs.python.org/3.7/whatsnew/3.7.html#other-language-changes

      I was using Python 3.5 (from anaconda), Spark 2.3.1 to reproduce thihs on my Hadoop Linux cluster and also on my Mac standalone spark installation.

      Attachments

        Activity

          People

            Unassigned Unassigned
            AbdealiJK Abdeali Kothari
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: