[SPARK-36617] Inconsistencies in approxQuantile annotations - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Resolved
Affects Version/s: 3.1.0, 3.2.0, 3.3.0
Fix Version/s: 3.2.0, 3.1.3
Component/s: PySpark, SQL
Labels:
None

Description

I've been reviewing PR in the legacy repo (https://github.com/zero323/pyspark-stubs/pull/552) and it looks like we have two problems with annotations for approxQuantile.

First of all DataFrame.approxQuantile should overload definition to match input arguments ‒ if col is a sequence then result should be a list of lists:

    @overload
    def approxQuantile(
        self,
        col: str,
        probabilities: Union[List[float], Tuple[float]],
        relativeError: float
    ) -> List[float]: ...
    @overload
    def approxQuantile(
        self,
        col: Union[List[str], Tuple[str]],
        probabilities: Union[List[float], Tuple[float]],
        relativeError: float
    ) -> List[List[float]]: ...

Additionally DataFrameStatFunctions.approxQuantile should match whatever we have in DataFrame.

Attachments

Issue Links

links to

[Github] Pull Request #33880 (carylee)

Activity

People

Assignee:: Cary Lee

Reporter:: Maciej Szymkiewicz

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/Aug/21 21:36

Updated:: 12/Dec/22 18:10

Resolved:: 02/Sep/21 13:23