Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.2.0, 1.2.1
    • 1.5.0
    • SQL
    • None

    Description

      This issue is actually caused by PARQUET-173.

      The following spark-shell session can be used to reproduce this bug:

      import org.apache.spark.sql.SQLContext
      
      val sqlContext = new SQLContext(sc)
      import sc._
      import sqlContext._
      
      case class KeyValue(key: Int, value: String)
      
      parallelize(1 to 1024 * 1024 * 20).
        flatMap(i => Seq.fill(10)(KeyValue(i, i.toString))).
        saveAsParquetFile("large.parquet")
      
      parquetFile("large.parquet").registerTempTable("large")
      
      hadoopConfiguration.set("parquet.task.side.metadata", "false")
      sql("SET spark.sql.parquet.filterPushdown=true")
      
      sql("SELECT value FROM large WHERE 1024 < value AND value < 2048").collect()
      

      From the log we can find:

      There were no row groups that could be dropped due to filter predicates
      

      Attachments

        Issue Links

          Activity

            People

              eggsby Thomas Omans
              lian cheng Cheng Lian
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: