Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3900

Add per-split runtime filtering to HdfsParquetScanner::ProcessSplit()

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • Impala 2.6.0
    • None
    • Backend

    Description

      If a partition filter arrives after a footer scan range for a Parquet has been issued, but before HdfsParquetScanner::ProcessSplit(), there's an opportunity to filter out all the scan ranges that would otherwise be issued when reading that footer, by adding a call to ScanRangeIsFilteredOut() at the top of that method.

      Care must be taken to ensure that all scan ranges are marked as done, since they won't be processed by their own scanner instances. This will avoid a recurrence of IMPALA-3804.

      Attachments

        Issue Links

          Activity

            People

              henryr Henry Robinson
              henryr Henry Robinson
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: