Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.2.1, 2.3.0
-
None
-
None
-
Tried on Ubuntu, FreBSD and windows, running spark-shell in local mode reading data from local file system
Description
Created a dummy partitioned data with partition column on string type col1=a and col1=b
added csv data-> read through spark > created partitioned external table> msck repair table to load partition. Did analyze on all columns and partition column as well.
println(spark.sql("select * from test_p where e='1a'").queryExecution.toStringWithStats)
val op = spark.sql("select * from test_p where e='1a'").queryExecution.optimizedPlan
// e is the partitioned column
val stat = op.stats(spark.sessionState.conf)
print(stat.rowCount)
Created the same way in parquet the rowcount comes up correctly in case of csv but in parquet it shows as None.