Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1964

Properly handle missing/null filter

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.12.0
    • None
    • None

    Description

      How to reproduce this issue:

      val hadoopInputFile = HadoopInputFile.fromPath(new Path("/path/to/parquet/000.snappy.parquet"), new Configuration())
      val reader = ParquetFileReader.open(hadoopInputFile)
      val recordCount = reader.getFilteredRecordCount
      reader.close()
      

      Output:

      java.lang.NullPointerException was thrown.
      java.lang.NullPointerException
      	at org.apache.parquet.internal.filter2.columnindex.ColumnIndexFilter.calculateRowRanges(ColumnIndexFilter.java:81)
      	at org.apache.parquet.hadoop.ParquetFileReader.getRowRanges(ParquetFileReader.java:961)
      	at org.apache.parquet.hadoop.ParquetFileReader.getFilteredRecordCount(ParquetFileReader.java:766)
      

      UPDATE: This is not only about the potential NPE if a null filter is set but to handle the missing/null filter in a better performing way. (Currently a NOOP filter implementation is used by default if no filter is set which requires to load the related data for column index/bloom filter even if no actual filtering will occur.)

      Attachments

        Activity

          People

            gszadovszky Gabor Szadovszky
            yumwang Yuming Wang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: