Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-10682

[Indexing job] Improve Mongo regex filter to only use positive conditions (no negations)

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Done
    • None
    • 1.62.0
    • indexing

    Description

      The current implementation of filtering excluded paths and custom regex is using a condition like

      { _id:  { $nin: [ /^[0-9]{1,3}:\/content\/dam\/.*$/ ]} 

      Mongo cannot evaluate this condition using the `_modified_id` index, it has to retrieve the full document, because a value of _null also matches this condition and the index does not contain null values. Therefore, when the index contains excluded paths, the download will be much slower because Mongo has to retrieve every single document to evaluate the condition.

      As a workaround, we can transform the regex on an equivalent one that matches the complement of the original regex using negative lookahead. This allows rewriting the filter condition using only positive conditions, which can be evaluated using only the index.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              nuno.santos Nuno Santos
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: