Details
-
Umbrella
-
Status: Resolved
-
Critical
-
Resolution: Done
-
3.4.0
-
None
-
None
Description
This JIRA aims to improve PySpark documentation in:
- pyspark
- pyspark.sql
- pyspark.sql.streaming
We should:
- Make the examples self-contained, e.g., https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
- Document Parameters https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot. There are many API that misses parameters in PySpark, e.g., DataFrame.union
If the size of file is large, e.g., dataframe.py, we should split that down into each subtask, and improve documentation.
Attachments
Issue Links
- is duplicated by
-
SPARK-33247 Improve examples and scenarios in docstrings
- Resolved