Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-9

InternalParquetRecordReader will not read multiple blocks when filtering

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.6.0
    • Component/s: parquet-mr
    • Labels:
      None

      Description

      The InternalParquetRecordReader keeps track of the count of records it has processed and uses that count to know when it is finished and when to load a new row group of data. But when it is wrapping a FilteredRecordReader, this count is not updated for records that are filtered, so when the reader exhausts the block it is reading, it will continue calling read() on the filtered reader and will pass null values to the caller.

      The quick fix is to detect null values returned by the record reader and update the count to read the next row group. But the longer-term solution is to correctly account for the filtered records.

      The pull request for the quick fix is #9.

        Attachments

          Activity

            People

            • Assignee:
              tomwhite Thomas White
              Reporter:
              rdblue Ryan Blue
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: