Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36394

Increase pandas API coverage in PySpark

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.0
    • None
    • PySpark
    • None

    Description

      Increase pandas API coverage in PySpark.

       

      Especially, pending PRs https://github.com/databricks/koalas/pulls should be ported. Existing tickets are created for porting purposes, please avoid working on that.

      Attachments

        Issue Links

          1.
          Implement DataFrame.cov Sub-task Resolved Xinrong Meng
          2.
          Implement DataFrame.mode Sub-task In Progress Unassigned
          3.
          Implement DataFrame.combine_first Sub-task Resolved Xinrong Meng
          4.
          Implement Series.cov Sub-task Resolved dgd_contributor
          5.
          Implement Series.combine Sub-task In Progress Unassigned
          6.
          Implement Index.putmask Sub-task In Progress Unassigned
          7.
          Implement DataFrame.lookup Sub-task Resolved Unassigned
          8.
          Implement MultIndex.equal_levels Sub-task Resolved Haejoon Lee
          9.
          Implement 'weights' and 'axis' in sample at DataFrame and Series Sub-task Open Unassigned
          10.
          Enable binary operations with list-like Python objects Sub-task In Progress Unassigned
          11.
          Support list-like Python objects for Series comparison Sub-task Resolved Haejoon Lee
          12.
          Implement DataFrame.join on key column Sub-task Open Unassigned
          13.
          Support errors='coerce' for ps.to_numeric Sub-task Resolved Unassigned
          14.
          Investigate native support for raw data containing commas Sub-task Open Unassigned
          15.
          Support dropping rows of a single-indexed DataFrame Sub-task Resolved Xinrong Meng
          16.
          Implement __getitem__ of label-based MultiIndex Sub-task In Progress Unassigned
          17.
          Add `errors` argument for `ps.to_numeric`. Sub-task Resolved Haejoon Lee
          18.
          Add `thousands` argument to `ps.read_csv`. Sub-task In Progress Unassigned
          19.
          Implement __setitem__ of label-based MultiIndex Sub-task Open Unassigned
          20.
          Implement Series.__xor__ Sub-task Resolved dgd_contributor
          21.
          Add `versionadded` for API added in Spark 3.3.0 Sub-task Resolved Xinrong Meng
          22.
          Support Series.__and__ for Integral Sub-task In Progress Unassigned
          23.
          Fix dropping all columns of a DataFrame Sub-task Resolved Xinrong Meng
          24.
          Fix ps.to_datetime with plurals of keys like years, months, days Sub-task Resolved dch nguyen
          25.
          Refactor _select_rows_by_iterable in iLocIndexer to use Column.isin Sub-task Resolved Xinrong Meng
          26.
          Introduce the 'compute.isin_limit' option Sub-task Resolved Xinrong Meng
          27.
          Fix Series.isin when Series has NaN values Sub-task Resolved dgd_contributor
          28.
          Improve `filter` of single-indexed DataFrame Sub-task Resolved Xinrong Meng
          29.
          Fix `pop` of Categorical Series Sub-task Resolved Xinrong Meng
          30.
          Fix ps.DataFrame.isin Sub-task Resolved dgd_contributor
          31.
          Implement ps.merge_asof Sub-task Resolved Takuya Ueshin
          32.
          Fix filtering a Series by a boolean Series Sub-task Resolved Xinrong Meng
          33.
          Support ps.MultiIndex.dtypes Sub-task Resolved dch nguyen
          34.
          Support time for ps.to_datetime Sub-task Resolved dgd_contributor
          35.
          ps.Series.dot raise "matrices are not aligned" if index is not same Sub-task Resolved dch nguyen
          36.
          Missing functionality in spark.pandas Sub-task Open Unassigned
          37.
          impl Series.autocorr Sub-task Resolved Ruifeng Zheng
          38.
          impl Series.ewm and DataFrame.ewm Sub-task Resolved Ruifeng Zheng
          39.
          impl Series.interpolate and DataFrame.interpolate Sub-task Resolved Ruifeng Zheng
          40.
          Impl DataFrame.corrwith Sub-task Resolved Ruifeng Zheng
          41.
          Impl DataFrame.boxplot and DataFrame.plot.box Sub-task Resolved Ruifeng Zheng
          42.
          Impl DataFrame.resample and Series.resample Sub-task Resolved Ruifeng Zheng
          43.
          impl Groupby.ewm Sub-task Resolved Ruifeng Zheng
          44.
          implement skew and kurt in Rolling/RollingGroupby/Expanding/ExpandingGroupby Sub-task Resolved Ruifeng Zheng
          45.
          Implement Groupby.skew Sub-task Resolved Ruifeng Zheng
          46.
          Implement Groupby.mad Sub-task Resolved Ruifeng Zheng

          Activity

            People

              XinrongM Xinrong Meng
              XinrongM Xinrong Meng
              Takuya Ueshin Takuya Ueshin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: