Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36617

Inconsistencies in approxQuantile annotations

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersStop watchingWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Resolved
    • 3.1.0, 3.2.0, 3.3.0
    • 3.2.0, 3.1.3
    • PySpark, SQL
    • None

    Description

      I've been reviewing PR in the legacy repo (https://github.com/zero323/pyspark-stubs/pull/552) and it looks like we have two problems with annotations for approxQuantile.

      First of all DataFrame.approxQuantile should overload definition to match input arguments ‒ if col is a sequence then result should be a list of lists:

          @overload
          def approxQuantile(
              self,
              col: str,
              probabilities: Union[List[float], Tuple[float]],
              relativeError: float
          ) -> List[float]: ...
          @overload
          def approxQuantile(
              self,
              col: Union[List[str], Tuple[str]],
              probabilities: Union[List[float], Tuple[float]],
              relativeError: float
          ) -> List[List[float]]: ...
      

      Additionally DataFrameStatFunctions.approxQuantile should match whatever we have in DataFrame.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            carylee Cary Lee Assign to me
            zero323 Maciej Szymkiewicz
            Votes:
            0 Vote for this issue
            Watchers:
            3 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment