Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Attached is a patch for an AutomatonQuery/Filter (name can change if its not suitable).

      Whereas the out-of-box contrib RegexQuery is nice, I have some very large indexes (100M+ unique tokens) where queries are quite slow, 2 minutes, etc. Additionally all of the existing RegexQuery implementations in Lucene are really slow if there is no constant prefix. This implementation does not depend upon constant prefix, and runs the same query in 640ms.

      Some use cases I envision:
      1. lexicography/etc on large text corpora
      2. looking for things such as urls where the prefix is not constant (http:// or ftp://)

      The Filter uses the BRICS package (http://www.brics.dk/automaton/) to convert regular expressions into a DFA. Then, the filter "enumerates" terms in a special way, by using the underlying state machine. Here is my short description from the comments:

      The algorithm here is pretty basic. Enumerate terms but instead of a binary accept/reject do:

      1. Look at the portion that is OK (did not enter a reject state in the DFA)
      2. Generate the next possible String and seek to that.

      the Query simply wraps the filter with ConstantScoreQuery.

      I did not include the automaton.jar inside the patch but it can be downloaded from http://www.brics.dk/automaton/ and is BSD-licensed.

      1. automaton.patch
        19 kB
        Robert Muir
      2. automatonMultiQuery.patch
        34 kB
        Robert Muir
      3. automatonmultiqueryfuzzy.patch
        47 kB
        Robert Muir
      4. automatonMultiQuerySmart.patch
        35 kB
        Robert Muir
      5. automatonWithWildCard.patch
        36 kB
        Robert Muir
      6. automatonWithWildCard2.patch
        36 kB
        Robert Muir
      7. BenchWildcard.java
        4 kB
        Robert Muir
      8. LUCENE-1606_nodep.patch
        194 kB
        Robert Muir
      9. LUCENE-1606.patch
        213 kB
        Robert Muir
      10. LUCENE-1606.patch
        213 kB
        Robert Muir
      11. LUCENE-1606.patch
        214 kB
        Robert Muir
      12. LUCENE-1606.patch
        204 kB
        Robert Muir
      13. LUCENE-1606.patch
        211 kB
        Robert Muir
      14. LUCENE-1606.patch
        211 kB
        Robert Muir
      15. LUCENE-1606.patch
        208 kB
        Robert Muir
      16. LUCENE-1606.patch
        198 kB
        Robert Muir
      17. LUCENE-1606.patch
        198 kB
        Robert Muir
      18. LUCENE-1606.patch
        200 kB
        Uwe Schindler
      19. LUCENE-1606.patch
        199 kB
        Uwe Schindler
      20. LUCENE-1606.patch
        198 kB
        Uwe Schindler
      21. LUCENE-1606.patch
        192 kB
        Robert Muir
      22. LUCENE-1606.patch
        58 kB
        Robert Muir
      23. LUCENE-1606.patch
        47 kB
        Robert Muir
      24. LUCENE-1606-flex.patch
        211 kB
        Uwe Schindler
      25. LUCENE-1606-flex.patch
        216 kB
        Robert Muir
      26. LUCENE-1606-flex.patch
        212 kB
        Robert Muir
      27. LUCENE-1606-flex.patch
        212 kB
        Robert Muir
      28. LUCENE-1606-flex.patch
        213 kB
        Robert Muir
      29. LUCENE-1606-flex.patch
        213 kB
        Robert Muir
      30. LUCENE-1606-flex.patch
        230 kB
        Uwe Schindler
      31. LUCENE-1606-flex.patch
        276 kB
        Uwe Schindler
      32. LUCENE-1606-flex.patch
        276 kB
        Uwe Schindler
      33. LUCENE-1606-flex.patch
        234 kB
        Robert Muir
      34. LUCENE-1606-flex.patch
        212 kB
        Robert Muir
      35. LUCENE-1606-flex.patch
        197 kB
        Michael McCandless

        Issue Links

          Activity

          Robert Muir created issue -
          Robert Muir made changes -
          Field Original Value New Value
          Attachment automaton.patch [ 12405633 ]
          Robert Muir made changes -
          Attachment automatonWithWildCard.patch [ 12405639 ]
          Robert Muir made changes -
          Attachment automatonWithWildCard2.patch [ 12405641 ]
          Michael McCandless made changes -
          Fix Version/s 2.9 [ 12312682 ]
          Robert Muir made changes -
          Attachment automatonMultiQuery.patch [ 12405828 ]
          Robert Muir made changes -
          Attachment automatonMultiQuerySmart.patch [ 12405860 ]
          Robert Muir made changes -
          Attachment automatonmultiqueryfuzzy.patch [ 12405882 ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12406682 ]
          Uwe Schindler made changes -
          Assignee Uwe Schindler [ thetaphi ]
          Uwe Schindler made changes -
          Fix Version/s 3.0 [ 12312889 ]
          Fix Version/s 2.9 [ 12312682 ]
          Robert Muir made changes -
          Assignee Uwe Schindler [ thetaphi ] Robert Muir [ rcmuir ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12422004 ]
          Robert Muir made changes -
          Fix Version/s 3.1 [ 12314025 ]
          Fix Version/s 3.0 [ 12312889 ]
          Robert Muir made changes -
          Attachment LUCENE-1606_nodep.patch [ 12425621 ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12425652 ]
          Uwe Schindler made changes -
          Attachment LUCENE-1606.patch [ 12425690 ]
          Uwe Schindler made changes -
          Attachment LUCENE-1606.patch [ 12425714 ]
          Uwe Schindler made changes -
          Attachment LUCENE-1606.patch [ 12425718 ]
          Uwe Schindler made changes -
          Component/s contrib/* [ 12312028 ]
          Component/s Search [ 12310235 ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12425725 ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12425729 ]
          Robert Muir made changes -
          Attachment BenchWildcard.java [ 12425732 ]
          Michael McCandless made changes -
          Attachment LUCENE-1606-flex.patch [ 12425764 ]
          Robert Muir made changes -
          Link This issue blocks LUCENE-2090 [ LUCENE-2090 ]
          Robert Muir made changes -
          Attachment LUCENE-1606-flex.patch [ 12425783 ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12425955 ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12425955 ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12425956 ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12425994 ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12426026 ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12426026 ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12426027 ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12426707 ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12426905 ]
          Robert Muir made changes -
          Attachment LUCENE-1606-flex.patch [ 12426928 ]
          Uwe Schindler made changes -
          Link This issue is related to LUCENE-2110 [ LUCENE-2110 ]
          Uwe Schindler made changes -
          Attachment LUCENE-1606-flex.patch [ 12427050 ]
          Uwe Schindler made changes -
          Attachment LUCENE-1606-flex.patch [ 12427051 ]
          Uwe Schindler made changes -
          Attachment LUCENE-1606-flex.patch [ 12427050 ]
          Uwe Schindler made changes -
          Attachment LUCENE-1606-flex.patch [ 12427059 ]
          Uwe Schindler made changes -
          Attachment LUCENE-1606-flex.patch [ 12427061 ]
          Uwe Schindler made changes -
          Attachment LUCENE-1606-flex.patch [ 12427059 ]
          Uwe Schindler made changes -
          Attachment LUCENE-1606-flex.patch [ 12427061 ]
          Uwe Schindler made changes -
          Attachment LUCENE-1606-flex.patch [ 12427063 ]
          Uwe Schindler made changes -
          Attachment LUCENE-1606-flex.patch [ 12427066 ]
          Uwe Schindler made changes -
          Attachment LUCENE-1606-flex.patch [ 12427068 ]
          Uwe Schindler made changes -
          Attachment LUCENE-1606-flex.patch [ 12427066 ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12427087 ]
          Robert Muir made changes -
          Attachment LUCENE-1606.patch [ 12427088 ]
          Robert Muir made changes -
          Link This issue depends on LUCENE-2111 [ LUCENE-2111 ]
          Robert Muir made changes -
          Attachment LUCENE-1606-flex.patch [ 12427114 ]
          Robert Muir made changes -
          Attachment LUCENE-1606-flex.patch [ 12427115 ]
          Robert Muir made changes -
          Attachment LUCENE-1606-flex.patch [ 12427180 ]
          Robert Muir made changes -
          Attachment LUCENE-1606-flex.patch [ 12427210 ]
          Robert Muir made changes -
          Attachment LUCENE-1606-flex.patch [ 12427245 ]
          Uwe Schindler made changes -
          Attachment LUCENE-1606-flex.patch [ 12427323 ]
          Robert Muir made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s Flex Branch [ 12314439 ]
          Fix Version/s 3.1 [ 12314025 ]
          Resolution Fixed [ 1 ]
          Uwe Schindler made changes -
          Fix Version/s 3.1 [ 12314025 ]
          Uwe Schindler made changes -
          Fix Version/s 4.0.0 [ 12314822 ]
          Fix Version/s 3.1 [ 12314025 ]
          Fix Version/s Flex Branch [ 12314439 ]
          Uwe Schindler made changes -
          Fix Version/s 4.0 [ 12314025 ]
          Fix Version/s 3.1 [ 12314822 ]
          Mark Thomas made changes -
          Workflow jira [ 12460955 ] Default workflow, editable Closed status [ 12563939 ]
          Mark Thomas made changes -
          Workflow Default workflow, editable Closed status [ 12563939 ] jira [ 12585428 ]
          Gavin made changes -
          Link This issue blocks LUCENE-2090 [ LUCENE-2090 ]
          Gavin made changes -
          Link This issue is depended upon by LUCENE-2090 [ LUCENE-2090 ]
          Gavin made changes -
          Link This issue depends on LUCENE-2111 [ LUCENE-2111 ]
          Gavin made changes -
          Link This issue depends upon LUCENE-2111 [ LUCENE-2111 ]
          Uwe Schindler made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Robert Muir
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development