Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10010

Should we have a NFA Query?

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 9.0
    • None
    • core/search
    • None
    • New

    Description

      Today when a RegexpQuery is created, it will be translated to NFA, determinized to DFA and eventually become an AutomatonQuery, which is very fast. However, not every NFA could be determinized to DFA easily, the example given in LUCENE-9981 showed how easy could a short regexp break the determinize process.

      Maybe, instead of marking those kind of queries as adversarial cases, we could make a new kind of NFA query, which execute directly on NFA and thus no need to worry about determinize process or determinized DFA size. It should be slower, but also makes those adversarial cases doable.

      This article has provided a simple but efficient way of searching over NFA, essentially it is a partial determinize process that only determinize the necessary part of DFA. Maybe we could give it a try?

      Attachments

        Activity

          People

            Unassigned Unassigned
            zhai7631 Patrick Zhai
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 12h 20m
                12h 20m