Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36617

Inconsistencies in approxQuantile annotations

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Resolved
    • Affects Version/s: 3.1.0, 3.2.0, 3.3.0
    • Fix Version/s: 3.2.0, 3.1.3
    • Component/s: PySpark, SQL
    • Labels:
      None

      Description

      I've been reviewing PR in the legacy repo (https://github.com/zero323/pyspark-stubs/pull/552) and it looks like we have two problems with annotations for approxQuantile.

      First of all DataFrame.approxQuantile should overload definition to match input arguments ‒ if col is a sequence then result should be a list of lists:

          @overload
          def approxQuantile(
              self,
              col: str,
              probabilities: Union[List[float], Tuple[float]],
              relativeError: float
          ) -> List[float]: ...
          @overload
          def approxQuantile(
              self,
              col: Union[List[str], Tuple[str]],
              probabilities: Union[List[float], Tuple[float]],
              relativeError: float
          ) -> List[List[float]]: ...
      

      Additionally DataFrameStatFunctions.approxQuantile should match whatever we have in DataFrame.

        Attachments

          Activity

            People

            • Assignee:
              carylee Cary Lee
              Reporter:
              zero323 Maciej Szymkiewicz
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: