Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40063

pyspark.pandas .apply() changing rows ordering

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.3.0
    • None
    • Pandas API on Spark
    • Databricks Runtime 11.1

    Description

      When using the apply function to apply a function to a DataFrame column, it ends up mixing the column's rows ordering.

      A command like this:

      def example_func(df_col):
        return df_col ** 2 
      
      df['col_to_apply_function'] = df.apply(lambda row: example_func(row['col_to_apply_function']), axis=1) 

      A workaround is to assign the results to a new column instead of the same one, but if the old column is dropped, the same error is produced.

      Setting one column as index also didn't work.

      Attachments

        Activity

          People

            Unassigned Unassigned
            marcelorossini Marcelo Rossini Castro
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: