Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44728

Improve PySpark documentations

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.5.0, 4.0.0
    • 4.0.0
    • PySpark

    Description

      An umbrella Jira ticket to improve the PySpark documentation.
       
       

      Attachments

        Issue Links

          1.
          Add canonical links to the PySpark docs page Sub-task Resolved BingKun Pan  
          2.
          Add Spark version drop down to the PySpark doc site Sub-task Resolved BingKun Pan  
          3.
          Switch languages consistently across docs for all code snippets Sub-task Resolved BingKun Pan

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          4.
          Some directories should be cleared when regenerating files Sub-task Resolved BingKun Pan  
          5.
          There should be a gap at the bottom of the HTML Sub-task Resolved BingKun Pan  
          6.
          Align example order (Python -> Scala/Java -> R) in all Spark Doc Content Sub-task Resolved BingKun Pan  
          7.
          Refine docstring of `DataFrame.drop` Sub-task Resolved BingKun Pan  
          8.
          Refine docstring of `groupBy/rollup/cube` Sub-task Resolved BingKun Pan  
          9.
          Refine docstring of `ceil/ceiling/floor/round/bround` Sub-task Resolved BingKun Pan  
          10.
          Refine docstring of `rand/randn` Sub-task Resolved BingKun Pan  
          11.
          Add Matomo analytics to all released docs pages Sub-task Resolved BingKun Pan  
          12.
          Supplementary exception class Sub-task Resolved BingKun Pan  
          13.
          Make code block copyable Sub-task Resolved BingKun Pan  
          14.
          Add user guide for type mappings between Spark and Python data types Sub-task Resolved Philip Dakin  
          15.
          Add documentation for type casting rules in Python UDFs/UDTFs Sub-task Open Unassigned  
          16.
          Make Python the first language in all Spark code snippet Sub-task Resolved Unassigned  
          17.
          Refine DocString of `Union*` Sub-task Resolved Ruifeng Zheng  
          18.
          Refine docstring of `DataFrame.columns` property Sub-task Resolved Allison Wang  
          19.
          Refine docstring of `DataFrame.isEmpty` Sub-task Resolved Allison Wang  
          20.
          Refine docstring of `createDataFrame` Sub-task Resolved Allison Wang  
          21.
          Refine the docstring of `DataFrame.collect` Sub-task Resolved Allison Wang  
          22.
          Fix wildcard import `from pyspark.sql.functions import *` in `Quick Start` Examples Sub-task Resolved Ruifeng Zheng  
          23.
          Fix docstring of `monotonically_increasing_id` Sub-task Resolved Ruifeng Zheng  
          24.
          Enable Doctests of `rand`, `randn` and `log` Sub-task Resolved Ruifeng Zheng  
          25.
          Refine docstring of `approx_count_distinct` Sub-task Resolved Yang Jie  
          26.
          Refine docstring for DataFrame.approxQuantile Sub-task Resolved Michael Zhang  
          27.
          Refine docstring of `DataFrame.filter` Sub-task Resolved Allison Wang  
          28.
          Refine docstring of `asc/desc` Sub-task Resolved Yang Jie  
          29.
          Refine docstring of `Column.between` Sub-task Resolved Allison Wang  
          30.
          Refine DocStrings of `try_{add, subtract, multiply, divide, avg, sum}` Sub-task Resolved Ruifeng Zheng  
          31.
          Refine docstring of `max` Sub-task Resolved Allison Wang  
          32.
          Refine docstring of `DataFrame.distinct` Sub-task Resolved Allison Wang  
          33.
          Refine docstrings of `coalesce/repartition/repartitionByRange` Sub-task Resolved Ruifeng Zheng  
          34.
          Refine docstrings of `min_by/max_by` Sub-task Resolved Yang Jie  
          35.
          Refine docstring of `min` Sub-task Resolved Allison Wang  
          36.
          Refine docstring of `explode` Sub-task Resolved Allison Wang  
          37.
          Refine docstrings of `collect_list/collect_set` Sub-task Resolved Yang Jie  
          38.
          Adjust the `versionadded` and `versionchanged` information to the parameters Sub-task Resolved Ruifeng Zheng  
          39.
          Refine docstring of `inline` Sub-task Resolved Allison Wang  
          40.
          Automate updating versions. json Sub-task Open Unassigned  
          41.
          XML: Refine docstring of schema_of_xml Sub-task Resolved Hyukjin Kwon  
          42.
          Update Example with docker official image Sub-task Resolved Ruifeng Zheng

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 10m
          43.
          Refine docstring of `array/array_contains/arrays_overlap` Sub-task Resolved Yang Jie  
          44.
          Refine docstring of `Column.isin` Sub-task Resolved Allison Wang  
          45.
          Refine docstring of `DataFrame.withColumnRenamed` Sub-task Resolved Allison Wang  
          46.
          Refine docstring of `DataFrame.join` Sub-task Resolved Allison Wang  
          47.
          Refine docstring of `DataFrameReader.parquet` Sub-task Resolved Allison Wang  
          48.
          Refine docstring of `DataFrameReader.json` Sub-task Resolved Hyukjin Kwon  
          49.
          Refine docstring of `Column.when` Sub-task Resolved Hyukjin Kwon  
          50.
          Refine DocString of `regr_*` functions Sub-task Resolved Ruifeng Zheng  
          51.
          Refine docstring of `sum` Sub-task Resolved Hyukjin Kwon  
          52.
          Refine docstring of `count` Sub-task Resolved Hyukjin Kwon  
          53.
          Refine docstring of count_distinct Sub-task Resolved Allison Wang  
          54.
          Configurable error when generating Python docs Sub-task Open Unassigned  
          55.
          python function categories should be consistent with SQL function groups Sub-task Resolved Ruifeng Zheng  
          56.
          Refine docstring of `create_map/slice/array_join` Sub-task Resolved Yang Jie  
          57.
          Refine docstring of `DataFrame.show` Sub-task Resolved Allison Wang  
          58.
          Refine docstring of `options` for dataframe reader and writer Sub-task Resolved Hyukjin Kwon  
          59.
          Refine docstring of `SparkSession.builder.config` Sub-task Resolved Allison Wang  
          60.
          Refine docstring of `lit` Sub-task Resolved Hyukjin Kwon  
          61.
          XML: Refine docstring of from_xml Sub-task Resolved Hyukjin Kwon  
          62.
          Add user guide for dataframe creation Sub-task Open Unassigned  
          63.
          Add a self-contained example about creating dataframe from jdbc Sub-task Open Unassigned  
          64.
          Add user guide for basic dataframe operations Sub-task Open Unassigned  
          65.
          Add user guide for column selections Sub-task Open Unassigned  
          66.
          Add user guide for groupby and aggregate Sub-task Open Unassigned  
          67.
          Add user guide for window operations Sub-task Open Unassigned  
          68.
          Refine docstring of `mapInPandas` Sub-task Resolved Allison Wang  
          69.
          Use built-in math constant in math functions Sub-task Resolved Ruifeng Zheng  
          70.
          Refine docstring of `DataFrame.substract` Sub-task Resolved Hyukjin Kwon  
          71.
          Refine docstring of `DataFrame.intersectAll` Sub-task Resolved Hyukjin Kwon  
          72.
          Refine docstring of `DataFrame.intersect` Sub-task Resolved Hyukjin Kwon  
          73.
          Refine docstring of `DataFrame.dropna/fillna/replace` Sub-task Resolved BingKun Pan  
          74.
          Improve basic datasource examples Sub-task Resolved Allison Wang  
          75.
          Document parameters and examples for RuntimeConf get, set and unset Sub-task Resolved Hyukjin Kwon  
          76.
          Refine docstring of UDTF Sub-task Resolved Hyukjin Kwon  
          77.
          Refine docstring of `concat/array_position/element_at/try_element_at` Sub-task Resolved Yang Jie  
          78.
          Add missing `toDegrees/toRadians/atan2/approxCountDistinct` function descriptions Sub-task Resolved Ruifeng Zheng  
          79.
          Correct the typing of schema_of_{csv, json, xml} Sub-task Resolved Ruifeng Zheng  
          80.
          Refine docstring of `array_prepend/array_append/array_insert` Sub-task Resolved Yang Jie  
          81.
          Refine docstring of `array_intersect/array_union/array_except` Sub-task Resolved Yang Jie  
          82.
          Refine docstring of `array_compact/array_distinct/array_remove` Sub-task Resolved Yang Jie  
          83.
          Refine docstring of `array_min/array_max/array_size/array_repeat` Sub-task Resolved Yang Jie  
          84.
          Refine docstring of `get/array_zip/sort_array` Sub-task Resolved Yang Jie  
          85.
          Refine docstring of `flatten/sequence/shuffle` Sub-task Resolved Yang Jie  
          86.
          Refine docstring for DataFrame.createTempView/createOrReplaceTempView Sub-task Resolved Hyukjin Kwon  
          87.
          Refine docstring for DataFrame.createGlobalTempView/createOrReplaceGlobalTempView Sub-task Resolved Hyukjin Kwon  
          88.
          Refine docstring for DataFrame.schema/explain/printSchema Sub-task Resolved Hyukjin Kwon  
          89.
          Refine docstring `reverse/map_contains_key` Sub-task Resolved BingKun Pan  
          90.
          Refine docstring of `map_from_arrays/map_from_entries/map_concat` Sub-task Resolved Yang Jie  
          91.
          Refine docstring of `parse_url/url_encode/url_decode` Sub-task Resolved Yang Jie  
          92.
          Refine docstring of `convert_timezone/make_dt_interval/make_interval` Sub-task Resolved BingKun Pan  
          93.
          Refine docstring `make_timestamp/make_timestamp_ltz/make_timestamp_ntz/make_ym_interval` Sub-task Resolved BingKun Pan  
          94.
          Refine docstring of `from_csv/schema_of_csv/to_csv` Sub-task Resolved Yang Jie  
          95.
          Refine docstring `aes_encrypt/aes_decrypt/try_aes_decrypt` Sub-task Resolved BingKun Pan  
          96.
          Refine docstring of `map_keys/map_values/map_entries` Sub-task Resolved Yang Jie  
          97.
          Refine docstring of `str_to_map/map_filter/map_zip_with` Sub-task Resolved Yang Jie  
          98.
          Refine docstring of `abs/acos/acosh` Sub-task Resolved Yang Jie  
          99.
          Refine docstring of `bit_and/bit_or/bit_xor` Sub-task Resolved Yang Jie  
          100.
          Refine docstring of `sum_distinct/array_agg/count_if` Sub-task Resolved Yang Jie  
          101.
          Refine docstring of `asc_nulls_first/asc_nulls_last/desc_nulls_first/desc_nulls_last` Sub-task Resolved Yang Jie  
          102.
          Refine docstring of `to_json/from_json` Sub-task Resolved Hyukjin Kwon  
          103.
          Refine docstring of `try_sum`, `try_avg`, `avg`, `sum`, `mean` Sub-task Resolved Hyukjin Kwon  
          104.
          Refine docstrings of try_* Sub-task Resolved Hyukjin Kwon  
          105.
          Improve docstring of mapInPandas Sub-task Resolved Xinrong Meng  
          106.
          Update the table-valued function documentation Sub-task Resolved Allison Wang  
          107.
          Add examples section header to `format_number` docstring Sub-task Resolved Thomas Hart  
          108.
          Improve clarity in lag docstring Sub-task Resolved Thomas Hart  
          109.
          Unify the 'See Also' section formatting across PySpark docstrings Sub-task Resolved Allison Wang  
          110.
          Update documentation to add `column` as alias of `col` Sub-task Open Unassigned  
          111.
          Fix the incorrect namings and missing params in func docs in `builtin.py` Sub-task Resolved Wei Guo  
          112.
          Test the default column name of array functions Sub-task Resolved Ruifeng Zheng  
          113.
          Add doctest for `options` in json functions Sub-task Resolved Ruifeng Zheng  
          114.
          Refine the type hints in functions Sub-task Resolved Ruifeng Zheng  
          115.
          Document the NaN handling in df.na.drop Sub-task Resolved Ruifeng Zheng  
          116.
          Fix type hint for `accuracy` in `percentile_approx` and `approx_percentile` Sub-task Resolved Ruifeng Zheng  
          117.
          Remove experimental API notes for pandas related functions Sub-task Resolved Allison Wang  
          118.
          Refine docstring for trigonometric functions Sub-task Resolved Ruifeng Zheng  
          119.
          Refine docstring for basic functions Sub-task Resolved Ruifeng Zheng  
          120.
          Refine the docstring for more date functions Sub-task Resolved Ruifeng Zheng  
          121.
          Refine the docstring of multiple datetime functions Sub-task Resolved Ruifeng Zheng  
          122.
          Refine docstrings for aggregation functions - part 1 Sub-task Resolved Ruifeng Zheng  

          Activity

            People

              Unassigned Unassigned
              allisonwang-db Allison Wang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m