[SPARK-40063] pyspark.pandas .apply() changing rows ordering - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.3.0
Fix Version/s: None
Component/s: Pandas API on Spark
Labels:
- Pandas
- PySpark
Environment:

Databricks Runtime 11.1

Language:
- Python

Description

When using the apply function to apply a function to a DataFrame column, it ends up mixing the column's rows ordering.

A command like this:

def example_func(df_col):
  return df_col ** 2 

df['col_to_apply_function'] = df.apply(lambda row: example_func(row['col_to_apply_function']), axis=1)

A workaround is to assign the results to a new column instead of the same one, but if the old column is dropped, the same error is produced.

Setting one column as index also didn't work.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Marcelo Rossini Castro

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 12/Aug/22 21:10

Updated:: 12/Dec/22 18:11