Description
This issue is actually caused by PARQUET-173.
The following spark-shell session can be used to reproduce this bug:
import org.apache.spark.sql.SQLContext val sqlContext = new SQLContext(sc) import sc._ import sqlContext._ case class KeyValue(key: Int, value: String) parallelize(1 to 1024 * 1024 * 20). flatMap(i => Seq.fill(10)(KeyValue(i, i.toString))). saveAsParquetFile("large.parquet") parquetFile("large.parquet").registerTempTable("large") hadoopConfiguration.set("parquet.task.side.metadata", "false") sql("SET spark.sql.parquet.filterPushdown=true") sql("SELECT value FROM large WHERE 1024 < value AND value < 2048").collect()
From the log we can find:
There were no row groups that could be dropped due to filter predicates
Attachments
Issue Links
- is blocked by
-
PARQUET-173 StatisticsFilter doesn't handle And properly
- Resolved
- relates to
-
SPARK-7743 Upgrade Parquet to 1.7
- Resolved