Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44728

Improve PySpark documentations

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.5.0, 4.0.0
    • 4.0.0
    • PySpark

    Description

      An umbrella Jira ticket to improve the PySpark documentation.
       
       

      Attachments

        1.
        Add canonical links to the PySpark docs page Sub-task Resolved Pan Bingkun   Actions
        2.
        Add Spark version drop down to the PySpark doc site Sub-task Resolved Pan Bingkun   Actions
        3.
        Switch languages consistently across docs for all code snippets Sub-task Resolved Pan Bingkun

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 1h
        Actions
        4.
        Some directories should be cleared when regenerating files Sub-task Resolved Pan Bingkun   Actions
        5.
        There should be a gap at the bottom of the HTML Sub-task Resolved Pan Bingkun   Actions
        6.
        Align example order (Python -> Scala/Java -> R) in all Spark Doc Content Sub-task Resolved Pan Bingkun   Actions
        7.
        Refine docstring of `DataFrame.drop` Sub-task Resolved Pan Bingkun   Actions
        8.
        Refine docstring of `groupBy/rollup/cube` Sub-task Resolved Pan Bingkun   Actions
        9.
        Refine docstring of `ceil/ceiling/floor/round/bround` Sub-task Resolved Pan Bingkun   Actions
        10.
        Refine docstring of `rand/randn` Sub-task Resolved Pan Bingkun   Actions
        11.
        Add Matomo analytics to all released docs pages Sub-task Resolved Pan Bingkun   Actions
        12.
        Supplementary exception class Sub-task Resolved Pan Bingkun   Actions
        13.
        Make code block copyable Sub-task Resolved Pan Bingkun   Actions
        14.
        Add user guide for type mappings between Spark and Python data types Sub-task Resolved Philip Dakin   Actions
        15.
        Add documentation for type casting rules in Python UDFs/UDTFs Sub-task Open Unassigned   Actions
        16.
        Make Python the first language in all Spark code snippet Sub-task Resolved Unassigned   Actions
        17.
        Refine DocString of `Union*` Sub-task Resolved Ruifeng Zheng   Actions
        18.
        Refine docstring of `DataFrame.columns` property Sub-task Resolved Allison Wang   Actions
        19.
        Refine docstring of `DataFrame.isEmpty` Sub-task Resolved Allison Wang   Actions
        20.
        Refine docstring of `createDataFrame` Sub-task Resolved Allison Wang   Actions
        21.
        Refine the docstring of `DataFrame.collect` Sub-task Resolved Allison Wang   Actions
        22.
        Fix wildcard import `from pyspark.sql.functions import *` in `Quick Start` Examples Sub-task Resolved Ruifeng Zheng   Actions
        23.
        Fix docstring of `monotonically_increasing_id` Sub-task Resolved Ruifeng Zheng   Actions
        24.
        Enable Doctests of `rand`, `randn` and `log` Sub-task Resolved Ruifeng Zheng   Actions
        25.
        Refine docstring of `approx_count_distinct` Sub-task Resolved Yang Jie   Actions
        26.
        Refine docstring for DataFrame.approxQuantile Sub-task Resolved Michael Zhang   Actions
        27.
        Refine docstring of `DataFrame.filter` Sub-task Resolved Allison Wang   Actions
        28.
        Refine docstring of `asc/desc` Sub-task Resolved Yang Jie   Actions
        29.
        Refine docstring of `Column.between` Sub-task Resolved Allison Wang   Actions
        30.
        Refine DocStrings of `try_{add, subtract, multiply, divide, avg, sum}` Sub-task Resolved Ruifeng Zheng   Actions
        31.
        Refine docstring of `max` Sub-task Resolved Allison Wang   Actions
        32.
        Refine docstring of `DataFrame.distinct` Sub-task Resolved Allison Wang   Actions
        33.
        Refine docstrings of `coalesce/repartition/repartitionByRange` Sub-task Resolved Ruifeng Zheng   Actions
        34.
        Refine docstrings of `min_by/max_by` Sub-task Resolved Yang Jie   Actions
        35.
        Refine docstring of `min` Sub-task Resolved Allison Wang   Actions
        36.
        Refine docstring of `explode` Sub-task Resolved Allison Wang   Actions
        37.
        Refine docstrings of `collect_list/collect_set` Sub-task Resolved Yang Jie   Actions
        38.
        Adjust the `versionadded` and `versionchanged` information to the parameters Sub-task Resolved Ruifeng Zheng   Actions
        39.
        Refine docstring of `inline` Sub-task Resolved Allison Wang   Actions
        40.
        Automate updating versions. json Sub-task Open Unassigned   Actions
        41.
        XML: Refine docstring of schema_of_xml Sub-task Resolved Hyukjin Kwon   Actions
        42.
        Update Example with docker official image Sub-task Resolved Ruifeng Zheng

        100%

        Original Estimate - Not Specified Original Estimate - Not Specified
        Time Spent - 10m
        Actions
        43.
        Refine docstring of `array/array_contains/arrays_overlap` Sub-task Resolved Yang Jie   Actions
        44.
        Refine docstring of `Column.isin` Sub-task Resolved Allison Wang   Actions
        45.
        Refine docstring of `DataFrame.withColumnRenamed` Sub-task Resolved Allison Wang   Actions
        46.
        Refine docstring of `DataFrame.join` Sub-task Resolved Allison Wang   Actions
        47.
        Refine docstring of `DataFrameReader.parquet` Sub-task Resolved Allison Wang   Actions
        48.
        Refine docstring of `DataFrameReader.json` Sub-task Resolved Hyukjin Kwon   Actions
        49.
        Refine docstring of `Column.when` Sub-task Resolved Hyukjin Kwon   Actions
        50.
        Refine DocString of `regr_*` functions Sub-task Resolved Ruifeng Zheng   Actions
        51.
        Refine docstring of `sum` Sub-task Resolved Hyukjin Kwon   Actions
        52.
        Refine docstring of `count` Sub-task Resolved Hyukjin Kwon   Actions
        53.
        Refine docstring of count_distinct Sub-task Resolved Allison Wang   Actions
        54.
        Configurable error when generating Python docs Sub-task Open Unassigned   Actions
        55.
        python function categories should be consistent with SQL function groups Sub-task Resolved Ruifeng Zheng   Actions
        56.
        Refine docstring of `create_map/slice/array_join` Sub-task Resolved Yang Jie   Actions
        57.
        Refine docstring of `DataFrame.show` Sub-task Resolved Allison Wang   Actions
        58.
        Refine docstring of `options` for dataframe reader and writer Sub-task Resolved Hyukjin Kwon   Actions
        59.
        Refine docstring of `SparkSession.builder.config` Sub-task Resolved Allison Wang   Actions
        60.
        Refine docstring of `lit` Sub-task Resolved Hyukjin Kwon   Actions
        61.
        XML: Refine docstring of from_xml Sub-task Resolved Hyukjin Kwon   Actions
        62.
        Add user guide for dataframe creation Sub-task Open Unassigned   Actions
        63.
        Add a self-contained example about creating dataframe from jdbc Sub-task Open Unassigned   Actions
        64.
        Add user guide for basic dataframe operations Sub-task Open Unassigned   Actions
        65.
        Add user guide for column selections Sub-task Open Unassigned   Actions
        66.
        Add user guide for groupby and aggregate Sub-task Open Unassigned   Actions
        67.
        Add user guide for window operations Sub-task Open Unassigned   Actions
        68.
        Refine docstring of `mapInPandas` Sub-task Resolved Allison Wang   Actions
        69.
        Use built-in math constant in math functions Sub-task Resolved Ruifeng Zheng   Actions
        70.
        Refine docstring of `DataFrame.substract` Sub-task Resolved Hyukjin Kwon   Actions
        71.
        Refine docstring of `DataFrame.intersectAll` Sub-task Resolved Hyukjin Kwon   Actions
        72.
        Refine docstring of `DataFrame.intersect` Sub-task Resolved Hyukjin Kwon   Actions
        73.
        Refine docstring of `DataFrame.dropna/fillna/replace` Sub-task Resolved Pan Bingkun   Actions
        74.
        Improve basic datasource examples Sub-task Resolved Allison Wang   Actions
        75.
        Document parameters and examples for RuntimeConf get, set and unset Sub-task Resolved Hyukjin Kwon   Actions
        76.
        Refine docstring of UDTF Sub-task Resolved Hyukjin Kwon   Actions
        77.
        Refine docstring of `concat/array_position/element_at/try_element_at` Sub-task Resolved Yang Jie   Actions
        78.
        Add missing `toDegrees/toRadians/atan2/approxCountDistinct` function descriptions Sub-task Resolved Ruifeng Zheng   Actions
        79.
        Correct the typing of schema_of_{csv, json, xml} Sub-task Resolved Ruifeng Zheng   Actions
        80.
        Refine docstring of `array_prepend/array_append/array_insert` Sub-task Resolved Yang Jie   Actions
        81.
        Refine docstring of `array_intersect/array_union/array_except` Sub-task Resolved Yang Jie   Actions
        82.
        Refine docstring of `array_compact/array_distinct/array_remove` Sub-task Resolved Yang Jie   Actions
        83.
        Refine docstring of `array_min/array_max/array_size/array_repeat` Sub-task Resolved Yang Jie   Actions
        84.
        Refine docstring of `get/array_zip/sort_array` Sub-task Resolved Yang Jie   Actions
        85.
        Refine docstring of `flatten/sequence/shuffle` Sub-task Resolved Yang Jie   Actions
        86.
        Refine docstring for DataFrame.createTempView/createOrReplaceTempView Sub-task Resolved Hyukjin Kwon   Actions
        87.
        Refine docstring for DataFrame.createGlobalTempView/createOrReplaceGlobalTempView Sub-task Resolved Hyukjin Kwon   Actions
        88.
        Refine docstring for DataFrame.schema/explain/printSchema Sub-task Resolved Hyukjin Kwon   Actions
        89.
        Refine docstring `reverse/map_contains_key` Sub-task Resolved Pan Bingkun   Actions
        90.
        Refine docstring of `map_from_arrays/map_from_entries/map_concat` Sub-task Resolved Yang Jie   Actions
        91.
        Refine docstring of `parse_url/url_encode/url_decode` Sub-task Resolved Yang Jie   Actions
        92.
        Refine docstring of `convert_timezone/make_dt_interval/make_interval` Sub-task Resolved Pan Bingkun   Actions
        93.
        Refine docstring `make_timestamp/make_timestamp_ltz/make_timestamp_ntz/make_ym_interval` Sub-task Resolved Pan Bingkun   Actions
        94.
        Refine docstring of `from_csv/schema_of_csv/to_csv` Sub-task Resolved Yang Jie   Actions
        95.
        Refine docstring `aes_encrypt/aes_decrypt/try_aes_decrypt` Sub-task Resolved Pan Bingkun   Actions
        96.
        Refine docstring of `map_keys/map_values/map_entries` Sub-task Resolved Yang Jie   Actions
        97.
        Refine docstring of `str_to_map/map_filter/map_zip_with` Sub-task Resolved Yang Jie   Actions
        98.
        Refine docstring of `abs/acos/acosh` Sub-task Resolved Yang Jie   Actions
        99.
        Refine docstring of `bit_and/bit_or/bit_xor` Sub-task Resolved Yang Jie   Actions
        100.
        Refine docstring of `sum_distinct/array_agg/count_if` Sub-task Resolved Yang Jie   Actions
        101.
        Refine docstring of `asc_nulls_first/asc_nulls_last/desc_nulls_first/desc_nulls_last` Sub-task Resolved Yang Jie   Actions
        102.
        Refine docstring of `to_json/from_json` Sub-task Resolved Hyukjin Kwon   Actions
        103.
        Refine docstring of `try_sum`, `try_avg`, `avg`, `sum`, `mean` Sub-task Resolved Hyukjin Kwon   Actions
        104.
        Refine docstrings of try_* Sub-task Resolved Hyukjin Kwon   Actions
        105.
        Improve docstring of mapInPandas Sub-task Resolved Xinrong Meng   Actions
        106.
        Update the table-valued function documentation Sub-task Resolved Allison Wang   Actions
        107.
        Add examples section header to `format_number` docstring Sub-task Resolved Thomas Hart   Actions
        108.
        Improve clarity in lag docstring Sub-task Resolved Thomas Hart   Actions
        109.
        Unify the 'See Also' section formatting across PySpark docstrings Sub-task Resolved Allison Wang   Actions
        110.
        Update documentation to add `column` as alias of `col` Sub-task Open Unassigned   Actions
        111.
        Fix the incorrect namings and missing params in func docs in `builtin.py` Sub-task Resolved Wei Guo   Actions
        112.
        Test the default column name of array functions Sub-task Resolved Ruifeng Zheng   Actions
        113.
        Add doctest for `options` in json functions Sub-task Resolved Ruifeng Zheng   Actions
        114.
        Refine the type hints in functions Sub-task Resolved Ruifeng Zheng   Actions
        115.
        Document the NaN handling in df.na.drop Sub-task Resolved Ruifeng Zheng   Actions
        116.
        Fix type hint for `accuracy` in `percentile_approx` and `approx_percentile` Sub-task Resolved Ruifeng Zheng   Actions
        117.
        Remove experimental API notes for pandas related functions Sub-task Resolved Allison Wang   Actions
        118.
        Refine docstring for trigonometric functions Sub-task Resolved Ruifeng Zheng   Actions
        119.
        Refine docstring for basic functions Sub-task Resolved Ruifeng Zheng   Actions
        120.
        Refine the docstring for more date functions Sub-task Resolved Ruifeng Zheng   Actions
        121.
        Refine the docstring of multiple datetime functions Sub-task Resolved Ruifeng Zheng   Actions
        122.
        Refine docstrings for aggregation functions - part 1 Sub-task Resolved Ruifeng Zheng   Actions
        123.
        Refine docstrings for aggregation functions - part 2 Sub-task Resolved Ruifeng Zheng   Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            allisonwang-db Allison Wang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 10m
                1h 10m

                Slack

                  Issue deployment