Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17129 Support statistics collection and cardinality estimation for partitioned tables
  3. SPARK-15616

CatalogRelation should fallback to HDFS size of partitions that are involved in Query if statistics are not available.

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      Currently if some partitions of a partitioned table are used in join operation we rely on Metastore returned size of table to calculate if we can convert the operation to Broadcast join.
      if Filter can prune some partitions, Hive can prune partition before determining to use broadcast joins according to HDFS size of partitions that are involved in Query.So sparkSQL needs it that can improve join's performance for partitioned table.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wwg28103 Hu Fuwang
                Reporter:
                lianhuiwang Lianhui Wang
              • Votes:
                1 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: