Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1913

Parquet table column pruning error caused by filter pushdown

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.0.1, 1.1.0
    • Component/s: SQL
    • Labels:
      None
    • Environment:

      mac os 10.9.2

      Description

      When scanning Parquet tables, attributes referenced only in predicates that are pushed down are not passed to the `ParquetTableScan` operator and causes exception. Verified in the sbt hive/console:

      loadTestTable("src")
      table("src").saveAsParquetFile("src.parquet")
      parquetFile("src.parquet").registerAsTable("src_parquet")
      hql("SELECT value FROM src_parquet WHERE key < 10").collect().foreach(println)
      

      Exception

      parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:/scratch/rxin/spark/src.parquet/part-r-2.parquet
      	at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:177)
      	at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
      	at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:122)
      	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
      	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
      	at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
      	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
      	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
      	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
      	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
      	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
      	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
      	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
      	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
      	at scala.collection.AbstractIterator.to(Iterator.scala:1157)
      	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
      	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
      	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
      	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
      	at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:717)
      	at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:717)
      	at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
      	at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1080)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
      	at org.apache.spark.scheduler.Task.run(Task.scala:51)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      Caused by: java.lang.IllegalArgumentException: Column key does not exist.
      	at parquet.filter.ColumnRecordFilter$1.bind(ColumnRecordFilter.java:51)
      	at org.apache.spark.sql.parquet.ComparisonFilter.bind(ParquetFilters.scala:306)
      	at parquet.io.FilteredRecordReader.<init>(FilteredRecordReader.java:46)
      	at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:74)
      	at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:110)
      	at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172)
      	... 28 more
      

        Attachments

          Activity

            People

            • Assignee:
              lian cheng Cheng Lian
              Reporter:
              crazyjvm Chen Chao
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: