Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22980

Support custom path filter for ORC tables

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • ORC
    • None

    Description

      The customer is looking for an option to specify custom path filter for ORC tables. Please find the details below from customer requirement.

      Problem Statement/Approach in customer words :

      Currently, Orc file input format does not take in path filters set in the property "mapreduce.input.pathfilter.class" OR " mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc files.

      AcidUtils class has a static filter called "hiddenFilters" which is used by ORC to filter input paths. If we can pass the custom filter classes(set in the property mentioned above) to AcidUtils and replace hiddenFilter with a filter that does an "and" operation over hiddenFilter+customFilters, the filters would work well.

      On local testing, mapreduce.input.pathfilter.class seems to be working for Text tables but not for ORC tables.

      Our analysis:

      OrcInputFormat and FileInputFormat are different implementations for Inputformat interface. Property "mapreduce.input.pathfilter.class" is only respected by FileInputFormat, but not by any other implementations of InputFormat. The customer wants to have the ability to filter out rows based on path/filenames, current ORC features like bloomfilters and indexes are not good enough for them to minimize number of disk read operations.

      Attachments

        1. HIVE-22980.1.patch
          8 kB
          Oleksiy Sayankin
        2. HIVE-22980.2.patch
          8 kB
          Oleksiy Sayankin
        3. HIVE-22980.3.patch
          8 kB
          Oleksiy Sayankin

        Issue Links

          Activity

            People

              osayankin Oleksiy Sayankin
              osayankin Oleksiy Sayankin
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: