Details
-
Umbrella
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.0
-
None
-
None
Description
Increase pandas API coverage in PySpark.
Especially, pending PRs https://github.com/databricks/koalas/pulls should be ported. Existing tickets are created for porting purposes, please avoid working on that.
Attachments
Issue Links
- contains
-
SPARK-36395 Implement map for indexes
- Open
- relates to
-
SPARK-40327 Increase pandas API coverage for pandas API on Spark
- Reopened
-
SPARK-34849 SPIP: Support pandas API layer on PySpark
- Resolved
1.
|
Implement DataFrame.mode | In Progress | Unassigned | |
2.
|
Implement Series.combine | In Progress | Unassigned | |
3.
|
Implement Index.putmask | In Progress | Unassigned | |
4.
|
Implement 'weights' and 'axis' in sample at DataFrame and Series | Open | Unassigned | |
5.
|
Enable binary operations with list-like Python objects | In Progress | Unassigned | |
6.
|
Implement DataFrame.join on key column | Open | Unassigned | |
7.
|
Investigate native support for raw data containing commas | Open | Unassigned | |
8.
|
Implement __getitem__ of label-based MultiIndex | In Progress | Unassigned | |
9.
|
Add `thousands` argument to `ps.read_csv`. | In Progress | Unassigned | |
10.
|
Implement __setitem__ of label-based MultiIndex | Open | Unassigned | |
11.
|
Support Series.__and__ for Integral | In Progress | Unassigned | |
12.
|
Missing functionality in spark.pandas | Open | Unassigned |