Description
Build details: Spark build from master branch (Apr-10)
TPC-DS at 200 GB scale stored in Parq format stored in hive.
Ran TPC-DS Query27 via Spark beeline client with "spark.sql.sources.fileScan=false".
java.lang.ClassCastException: org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader cannot be cast to org.apache.parquet.hadoop.ParquetRecordReader at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetInputFormat.createRecordReader(ParquetRelation.scala:480) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetInputFormat.createRecordReader(ParquetRelation.scala:476) at org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.<init>(SqlNewHadoopRDD.scala:161) at org.apache.spark.rdd.SqlNewHadoopRDD.compute(SqlNewHadoopRDD.scala:121) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:69) at org.apache.spark.scheduler.Task.run(Task.scala:82) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:231) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Creating this JIRA as a placeholder to track this issue.