Details
-
Improvement
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
Description
There are several operations that we currently disallow because they produce a variable set of columns in the output based on the data (non-deferred-columns). However, for some dtypes (categorical, boolean) we can easily enumerate all the possible values that will be seen at execution time, so we can predict the columns that will be seen.
Note we still can't implement these operations 100% correctly, as pandas will typically only create columns for the values that are observed, while we'd have to create a column for every possible value.
We should allow these operations in these special cases.
Operations in this category:
- DataFrame.unstack, Series.unstack (can work if unstacked level is a categorical or boolean column)
- Series.str.get_dummies
- Series.str.split
- Series.str.rsplit
- DataFrame.pivot
- DataFrame.pivot_table
Attachments
Issue Links
- is a child of
-
BEAM-12133 Tracking: DataFrame API future enhancements
- Open
- links to
1.
|
Implement Series.str.get_dummies() for DataFrame API | Resolved | Andy Ye |
|
||||||||
2.
|
Implement Series.str.split() and Series.str.rsplit() for DataFrame API | Resolved | Andy Ye |
|
||||||||
3.
|
Implement DataFrame.unstack() and Series.unstack() for DataFrame API | Triage Needed | Andy Ye |
|
||||||||
4.
|
Implement DataFrame.pivot() for DataFrame API | Triage Needed | Andy Ye |
|
||||||||
5.
|
Implement DataFrame.pivot_table() for DataFrame API | Open | Unassigned |
|
||||||||
6.
|
Implement len(GroupBy) and ngroups for DataFrame API | Open | Unassigned | |||||||||
7.
|
Documentation for non-deferred-column operations | Open | Unassigned | |||||||||
8.
|
Update DataFrame rsplit() api once pandas rsplit() supports regex | Open | Unassigned |