[SPARK-36707] Support to specify index type and name in pandas API on Spark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Umbrella
Status: Resolved
Priority: Major
Resolution: Done
Affects Version/s: 3.3.0
Fix Version/s: 3.3.0
Component/s: PySpark
Labels:
- release-notes

Description

See https://koalas.readthedocs.io/en/latest/user_guide/typehints.html.

pandas API on Spark currently there's no way to specify the index type and name in the output when you apply an arbitrary function, which forces to create the default index:

>>> def transform(pdf) -> pd.DataFrame["id": int, "A": int]:
...     pdf['A'] = pdf.id + 1
...     return pdf
...
>>> ps.range(5).koalas.apply_batch(transform)

We should have a way to specify the index.

Attachments

Sub-Tasks

1.	Support numpy.typing for annotating ArrayType	Resolved	Hyukjin Kwon
2.	Support new syntax for specifying index type and name	Resolved	Hyukjin Kwon
3.	Support new syntax in function apply APIs	Resolved	Hyukjin Kwon
4.	Support multi-index in new syntax	Resolved	dch nguyen
5.	Document new syntax for specifying index type	Resolved	Hyukjin Kwon
6.	Error when list of data type tuples has len = 1	Resolved	dgd_contributor

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 10/Sep/21 06:34

Updated:: 12/Dec/22 17:51

Resolved:: 07/Oct/21 09:36