[SPARK-38573] Support Auto Partition Statistics Collection - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.0
Fix Version/s: 3.4.0
Component/s: SQL
Labels:
None

Description

Currently https://issues.apache.org/jira/browse/SPARK-21127 supports storing the aggregated stats at table level for partitioned tables with config spark.sql.statistics.size.autoUpdate.enabled.

Supporting partition level stats are useful to know which partitions are outliers (skewed partition) and query optimizer works better with partition level stats in case of partition pruning.

Attachments

Issue Links

is related to

SPARK-21127 Update statistics after data changing commands

Resolved

SPARK-33825 Is Spark SQL able to auto update partition stats like hive by setting hive.stats.autogather=true

Resolved

links to

[Github] Pull Request #36067 (kazuyukitanimura)

Activity

People

Assignee:: Kazuyuki Tanimura

Reporter:: Kazuyuki Tanimura

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 16/Mar/22 19:34

Updated:: 15/Apr/22 16:18

Resolved:: 15/Apr/22 16:18