[OAK-2568] Ignore redundant IS NOT NULL constraints - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.0.12
Component/s: lucene
Labels:
None

Description

Query like below at times take quite a time to evaluate with LucenePropertyIndex

SELECT * FROM [nt:unstructured] as content WHERE ISDESCENDANTNODE('/content/dam/en/us')
and(
    content.[tags] = 'Products:A'
    or content.[tags] = 'Products:A/B'
    or content.[tags] = 'Products:A/B'
    or content.[tags] = 'Products:A'
)
and(
    content.[tags] = 'DocTypes:A'
    or content.[tags] = 'DocTypes:B'
    or content.[tags] = 'DocTypes:C'
    or content.[tags] = 'ProblemType:A'
)
and(
    content.[hasRendition] IS NULL
    or content.[hasRendition] = 'false'
)

Now above SQL query translates to following plan

Plan on 1.0 branch

[nt:unstructured] as [content] /* lucene:test1(/oak:index/test1) +tags:[* TO *] +(tags:Products:A tags:Products:A/B tags:Products:A/B tags:Products:A) +(tags:DocTypes:A tags:DocTypes:B tags:DocTypes:C tags:ProblemType:A)
  where ((((isdescendantnode([content], [/content/dam/en/us]))
  and ([content].[tags] is not null))
  and ([content].[tags] in(cast('Products:A' as string), cast('Products:A/B' as string), cast('Products:A/B' as string), cast('Products:A' as string))))
  and ([content].[tags] is not null))
  and ([content].[tags] in(cast('DocTypes:A' as string), cast('DocTypes:B' as string), cast('DocTypes:C' as string), cast('ProblemType:A' as string))) */

Note the extra property restriction of not null which translates in Lucene to +tags:[* TO *]

Plan on trunk

[nt:unstructured] as [content] /* lucene:test1(/oak:index/test1) +(tags:Products:A tags:Products:A/B) +(tags:DocTypes:A tags:DocTypes:B tags:DocTypes:C tags:ProblemType:A)
  where (isdescendantnode([content], [/content/dam/en/us]))
  and ([content].[tags] in('Products:A', 'Products:A/B'))
  and ([content].[tags] in('DocTypes:A', 'DocTypes:B', 'DocTypes:C', 'ProblemType:A')) */

This one does not have the extra not null constraint

The query was performing slower on Lucene because the property existence query i.e. not null constraint is currently evaluated as a range query in Lucene which looks like is bit expensive to evaluate.

Now as shown above it appears that on trunk the QueryEngine performs such an optimization on its own (possibly done with 1610723 as part of ~~OAK-1965~~. This change is not present in branch.

Given that change in ~~OAK-1965~~ was quite big it would be better to perform such optimization in LucenePropertyIndex itself

Attachments

Activity

People

Assignee:: Chetan Mehrotra

Reporter:: Chetan Mehrotra

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 03/Mar/15 11:14

Updated:: 20/Apr/15 07:30

Resolved:: 03/Mar/15 12:00