Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-744 LazyIO of non-filter columns
  3. ORC-743

Conversion of SArg into Filters, to take advantage of LazyIO

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.7.0
    • Reader
    • None

    Description

      ORC-742 introduces lazy evaluation of the non-filter columns in the presence of filters. This builds further on that to convert SArg into filters.

      SArg to Filter

      SArg to Filter converts the passed SArg into a filter. This enables automatic compatibility with both Spark and Hive as they already push down Search Arguments down to ORC.

      The SArg is automatically converted into a Vector Filter. Which is applied during the read process.

      The builder for search argument should allow skipping normalization during the build. This has already been proposed as part of HIVE-24458.

      Normalization is very poor in performance in the presence of multilevel predicates.

      Benchmark (fSize) (fType) (normalize) Mode Cnt Score Error Units
      ComplexFilterBench.filter 2 vector true avgt 20 74.321 ± 0.156 us/op
      ComplexFilterBench.filter 2 vector false avgt 20 78.119 ± 0.351 us/op
      ComplexFilterBench.filter 4 vector true avgt 20 267.405 ± 1.202 us/op
      ComplexFilterBench.filter 4 vector false avgt 20 136.284 ± 0.637 us/op
      ComplexFilterBench.filter 8 vector true avgt 20 9907.765 ± 49.208 us/op
      ComplexFilterBench.filter 8 vector false avgt 20 247.714 ± 0.651 us/op

      Explanation:

      • fSize identifies the size of the OR clause that will be normalized.
      • normalize identifies whether normalize was carried out on the Search Argument.

      Observations:

      • Normalizing the search argument results in a significant performance penalty given the explosion of the operator tree
        • In case where an AND includes 8 ORs, the unnormalized version is faster by 97.32%

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            planka Pavan Lanka
            planka Pavan Lanka
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment