[SPARK-25185] CBO rowcount statistics doesn't work for partitioned parquet external table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.2.1, 2.3.0
Fix Version/s: None
Component/s: Spark Core, SQL
Labels:
None
Environment:

Tried on Ubuntu, FreBSD and windows, running spark-shell in local mode reading data from local file system

Description

Created a dummy partitioned data with partition column on string type col1=a and col1=b

added csv data-> read through spark ~~> created partitioned external table~~> msck repair table to load partition. Did analyze on all columns and partition column as well.

_{println(spark.sql("select * from test_p where e='1a'").queryExecution.toStringWithStats)}
_{val op = spark.sql("select * from test_p where e='1a'").queryExecution.optimizedPlan}

// e is the partitioned column
_{val stat = op.stats(spark.sessionState.conf)}
_{print(stat.rowCount)}

Created the same way in parquet the rowcount comes up correctly in case of csv but in parquet it shows as None.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Amit

Votes:: 5 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 22/Aug/18 05:04

Updated:: 27/May/20 18:29