Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42883 Implement Pandas API Missing Parameters
  3. SPARK-38763

Pandas API on spark Can`t apply lamda to columns.

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0, 3.4.0
    • 3.3.0
    • PySpark
    • None

    Description

      When I use a spark master build from 08 November 21 I can use this code to rename columns

      pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x))
      pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x))
      pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x))
      

      But now after I get this error when I use this code.

      ---------------------------------------------------------------------------
      ValueError Traceback (most recent call last)
      Input In [5], in <cell line: 1>()
      ----> 1 pf05 = pf05.rename(columns=lambda x: re.sub('DOFFIN_ESENDERS:', '', x))
      2 pf05 = pf05.rename(columns=lambda x: re.sub('FORM_SECTION:', '', x))
      3 pf05 = pf05.rename(columns=lambda x: re.sub('F05_2014:', '', x))

      File /opt/spark/python/pyspark/pandas/frame.py:10636, in DataFrame.rename(self, mapper, index, columns, axis, inplace, level, errors)
      10632 index_mapper_fn, index_mapper_ret_dtype, index_mapper_ret_stype = gen_mapper_fn(
      10633 index
      10634 )
      10635 if columns:
      > 10636 columns_mapper_fn, _, _ = gen_mapper_fn(columns)
      10638 if not index and not columns:
      10639 raise ValueError("Either `index` or `columns` should be provided.")

      File /opt/spark/python/pyspark/pandas/frame.py:10603, in DataFrame.rename.<locals>.gen_mapper_fn(mapper)
      10601 elif callable(mapper):
      10602 mapper_callable = cast(Callable, mapper)
      > 10603 return_type = cast(ScalarType, infer_return_type(mapper))
      10604 dtype = return_type.dtype
      10605 spark_return_type = return_type.spark_type

      File /opt/spark/python/pyspark/pandas/typedef/typehints.py:563, in infer_return_type(f)
      560 tpe = get_type_hints(f).get("return", None)
      562 if tpe is None:
      --> 563 raise ValueError("A return value is required for the input function")
      565 if hasattr(tpe, "_origin") and issubclass(tpe.origin_, SeriesType):
      566 tpe = tpe._args_[0]

      ValueError: A return value is required for the input function

      Attachments

        Activity

          People

            XinrongM Xinrong Meng
            bjornjorgensen Bjørn Jørgensen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: