Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32095

[DataSource V2] Documentation on SupportsReportStatistics Outdated?

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.4.6, 3.0.0
    • 3.0.1, 3.1.0
    • SQL
    • None

    Description

      I was wondering if the documentation on SupportsReportStatistics [1][3] about its interaction with the planner and predicate pushdowns is still accurate. It says:

      "Implementations that return more accurate statistics based on pushed operators will not improve query performance until the planner can push operators before getting stats."

       

      Is this still accurate? When looking through the code it seems like there is now functionality that explicitly wants the operators pushed down [2]. Is the documentation for SupportsReportStatistics referring to something other than [2] or should it be updated?

       

      [1]https://spark.apache.org/docs/2.4.6/api/java/org/apache/spark/sql/sources/v2/reader/SupportsReportStatistics.html

      [2] https://github.com/apache/spark/blob/d0800fc8e2e71a79bf0f72c3e4bc608ae34053e7/sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala#L86

      [3]https://spark.apache.org/docs/3.0.0-preview/api/java/org/apache/spark/sql/connector/read/SupportsReportStatistics.html

      Attachments

        Activity

          People

            emkornfield Micah Kornfield
            emkornfield Micah Kornfield
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: