Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27698

Add new method for getting pushed down filters in Parquet file reader

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      To return accurate pushed filters in Parquet file scan(https://github.com/apache/spark/pull/24327#pullrequestreview-234775673), we can process the original data source filters in the following way:
      1. For "And" operators, split the conjunctive predicates and try converting each of them. After that
      1.1 if partially predicate pushed down is allowed, return convertible results;
      1.2 otherwise, return the whole predicate if convertible, or empty result if not convertible.

      2. For other operators, they are not able to be partially pushed down.
      2.1 if the entire predicate is convertible, return itself
      2.2 otherwise, return an empty result.

      This PR also contains code refactoring. Currently `ParquetFilters. createFilter ` accepts parameter `schema: MessageType` and create field mapping for every input filter. We can make it a class member and avoid creating the `nameToParquetField` mapping for every input filter.

      Attachments

        Issue Links

          Activity

            People

              Gengliang.Wang Gengliang Wang
              Gengliang.Wang Gengliang Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: