Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36617

Inconsistencies in approxQuantile annotations

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Resolved
    • 3.1.0, 3.2.0, 3.3.0
    • 3.2.0, 3.1.3
    • PySpark, SQL
    • None

    Description

      I've been reviewing PR in the legacy repo (https://github.com/zero323/pyspark-stubs/pull/552) and it looks like we have two problems with annotations for approxQuantile.

      First of all DataFrame.approxQuantile should overload definition to match input arguments ‒ if col is a sequence then result should be a list of lists:

          @overload
          def approxQuantile(
              self,
              col: str,
              probabilities: Union[List[float], Tuple[float]],
              relativeError: float
          ) -> List[float]: ...
          @overload
          def approxQuantile(
              self,
              col: Union[List[str], Tuple[str]],
              probabilities: Union[List[float], Tuple[float]],
              relativeError: float
          ) -> List[List[float]]: ...
      

      Additionally DataFrameStatFunctions.approxQuantile should match whatever we have in DataFrame.

      Attachments

        Activity

          People

            carylee Cary Lee
            zero323 Maciej Szymkiewicz
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: