Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42883

Implement Pandas API Missing Parameters

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Major
    • Resolution: Resolved
    • 3.4.0
    • None
    • Pandas API on Spark
    • None

    Description

      pandas API on Spark aims to make pandas code work on Spark clusters without any changes. So full API coverage has been one of our major goals. Currently, most pandas functions are implemented, whereas some of them are have incomplete parameters support.

      There are some common parameters missing (resolved):
       * How to do with NAs   
       * Filter data types    
       * Control result length    
       * Reindex result   

      There are remaining missing parameters to implement (see doc below).

      See the design and the current status at https://docs.google.com/document/d/1H6RXL6oc-v8qLJbwKl6OEqBjRuMZaXcTYmrZb9yNm5I/edit?usp=sharing.

      Attachments

        1.
        EWM support ignore_na Sub-task Resolved Ruifeng Zheng
        2.
        Allow `columns` parameter when creating DataFrame with Series. Sub-task Resolved Haejoon Lee
        3.
        Support `ignore_index` of `Series.sort_values` Sub-task Resolved Xinrong Meng
        4.
        Enable Series.rename to change index labels Sub-task Resolved Xinrong Meng
        5.
        Implement `skipna` parameter of `DataFrame.all` Sub-task Resolved Xinrong Meng
        6.
        Implement `ignore_index` of `DataFrame.sort_index`. Sub-task Resolved Xinrong Meng
        7.
        Implement `keep` parameter of `(Index/MultiIndex).drop_duplicates` Sub-task Resolved Xinrong Meng
        8.
        Implement `dropna` parameter of `SeriesGroupBy.value_counts` Sub-task Resolved Xinrong Meng
        9.
        interpolate supports limit_area Sub-task Resolved Ruifeng Zheng
        10.
        Support `return_indexer` parameter of `Index/MultiIndex.sort_values` Sub-task Resolved Xinrong Meng
        11.
        Pandas API on spark Can`t apply lamda to columns. Sub-task Resolved Xinrong Meng
        12.
        Implement `ignore_index` of `DataFrame/Series.sample` Sub-task Resolved Xinrong Meng
        13.
        Implement `ignore_index` of `DataFrame.explode` and `DataFrame.drop_duplicates` Sub-task Resolved Xinrong Meng
        14.
        Implement `bool_only` parameter of `DataFrame.all` and`DataFrame.any` Sub-task Resolved Xinrong Meng
        15.
        Implement `numeric_only` parameter for `DataFrame/Series.rank` to rank numeric columns only Sub-task Resolved Xinrong Meng
        16.
        Support `na_action` and Series input correspondence in `Series.map` Sub-task Resolved Xinrong Meng
        17.
        Support string `inclusive` parameter of `Series.between` Sub-task Resolved Xinrong Meng
        18.
        Implement axis and skipna of Series.argmin Sub-task Resolved Ruifeng Zheng
        19.
        Implement `inplace` and `columns` parameters of `Series.drop` Sub-task Resolved Xinrong Meng
        20.
        Support `how` parameter of `MultiIndex.dropna` Sub-task Resolved Xinrong Meng
        21.
        Implement `inplace` parameter of `Series.clip` Sub-task Resolved Xinrong Meng
        22.
        Implement `skipna` of `Series.all/Index.all` to exclude NA/null values Sub-task Resolved Xinrong Meng
        23.
        Add `Series.duplicated` to indicate duplicate Series values. Sub-task Resolved Xinrong Meng
        24.
        interpolate support param `limit_direction` Sub-task Resolved Ruifeng Zheng
        25.
        Implement `ignore_index` of `Series.sort_values` and `Series.sort_index` Sub-task Resolved Xinrong Meng
        26.
        Implement `skipna` of `Series.argmax` Sub-task Resolved Xinrong Meng
        27.
        Support string and bool `regex` in `Series.replace` Sub-task Resolved Xinrong Meng
        28.
        Implement `keep` parameter of `frame.nlargest/nsmallest` to decide how to resolve ties Sub-task Resolved Xinrong Meng
        29.
        Adjust `GroupBy.std` to match pandas 1.4 Sub-task Resolved Xinrong Meng
        30.
        Convert bools to ints in basic statistical functions of GroupBy objects Sub-task Resolved Xinrong Meng
        31.
        Implement `numeric_only` of `GroupBy.first` and `GroupBy.last` Sub-task Resolved Xinrong Meng
        32.
        Implement `skipna` parameter of `GroupBy.all` Sub-task Resolved Xinrong Meng
        33.
        Refactor `GroupBy._reduce_for_stat_function` on accepted data types Sub-task Resolved Xinrong Meng
        34.
        Implement `skipna` of basic statistical functions of DataFrame and Series Sub-task Resolved Xinrong Meng
        35.
        Adjust `GroupBy.mean/median` to match pandas 1.4 Sub-task Resolved Apache Spark
        36.
        Implement `numeric_only` parameter of `GroupBy.max/min` Sub-task Resolved Xinrong Meng

        Activity

          People

            XinrongM Xinrong Meng
            XinrongM Xinrong Meng
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: