Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7411

Regex Query with Backreferences

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • core/search
    • None
    • New

    Description

      Hi there,

      I am currently working on a Regex Engine that supports Backreferences while not losing determinism. It uses Memory Occurence Automata (MOAs) in the engine which are more powerful than normal DFA/NFAs. The engine does no backtracking and recognizes Regexes that cannot be evaluated deterministically as malformed. It has become more and more mature in the last few weeks and I also implemented a Lucene Query that uses these Patterns in the background. Now my question is: Is there any interest for this work to be merged (or adapted) into Lucene core?

      EDIT:

      The current state is only a mere proof of concept. The performance can probably be improved by a lot by adapting concepts of the Lucene Regexp Query. As Uwe Schindler correctly stated, the Query currently is quite "dumb" as in it doesn't predict what terms to match next.

      https://github.com/s4ke/moar

      Usage example for the Lucene Query:

      https://github.com/s4ke/moar/blob/master/lucene/src/test/java/com/github/s4ke/moar/lucene/query/test/MoarQueryTest.java#L126

      Cheers,

      Martin

      Attachments

        Activity

          People

            Unassigned Unassigned
            s4ke Martin Braun
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: