Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-2568

Ignore redundant IS NOT NULL constraints

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.12
    • Component/s: lucene
    • Labels:
      None

      Description

      Query like below at times take quite a time to evaluate with LucenePropertyIndex

      SELECT * FROM [nt:unstructured] as content WHERE ISDESCENDANTNODE('/content/dam/en/us')
      and(
          content.[tags] = 'Products:A'
          or content.[tags] = 'Products:A/B'
          or content.[tags] = 'Products:A/B'
          or content.[tags] = 'Products:A'
      )
      and(
          content.[tags] = 'DocTypes:A'
          or content.[tags] = 'DocTypes:B'
          or content.[tags] = 'DocTypes:C'
          or content.[tags] = 'ProblemType:A'
      )
      and(
          content.[hasRendition] IS NULL
          or content.[hasRendition] = 'false'
      )
      

      Now above SQL query translates to following plan

      Plan on 1.0 branch

      [nt:unstructured] as [content] /* lucene:test1(/oak:index/test1) +tags:[* TO *] +(tags:Products:A tags:Products:A/B tags:Products:A/B tags:Products:A) +(tags:DocTypes:A tags:DocTypes:B tags:DocTypes:C tags:ProblemType:A)
        where ((((isdescendantnode([content], [/content/dam/en/us]))
        and ([content].[tags] is not null))
        and ([content].[tags] in(cast('Products:A' as string), cast('Products:A/B' as string), cast('Products:A/B' as string), cast('Products:A' as string))))
        and ([content].[tags] is not null))
        and ([content].[tags] in(cast('DocTypes:A' as string), cast('DocTypes:B' as string), cast('DocTypes:C' as string), cast('ProblemType:A' as string))) */
      

      Note the extra property restriction of not null which translates in Lucene to +tags:[* TO *]

      Plan on trunk

      [nt:unstructured] as [content] /* lucene:test1(/oak:index/test1) +(tags:Products:A tags:Products:A/B) +(tags:DocTypes:A tags:DocTypes:B tags:DocTypes:C tags:ProblemType:A)
        where (isdescendantnode([content], [/content/dam/en/us]))
        and ([content].[tags] in('Products:A', 'Products:A/B'))
        and ([content].[tags] in('DocTypes:A', 'DocTypes:B', 'DocTypes:C', 'ProblemType:A')) */
      

      This one does not have the extra not null constraint

      The query was performing slower on Lucene because the property existence query i.e. not null constraint is currently evaluated as a range query in Lucene which looks like is bit expensive to evaluate.

      Now as shown above it appears that on trunk the QueryEngine performs such an optimization on its own (possibly done with 1610723 as part of OAK-1965. This change is not present in branch.

      Given that change in OAK-1965 was quite big it would be better to perform such optimization in LucenePropertyIndex itself

        Attachments

          Activity

            People

            • Assignee:
              chetanm Chetan Mehrotra
              Reporter:
              chetanm Chetan Mehrotra
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: