Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff/blocker for LUCENE-5879.

      It seems like PrefixQuery should "simply" be an AutomatonQuery rather than specializing its own TermsEnum ... with maybe some performance improvements to ByteRunAutomaton.run to short-circuit once it's in a "sink state", AutomatonTermsEnum could be just as fast as PrefixTermsEnum.

      If we can do this it will make LUCENE-5879 simpler.

      1. LUCENE-6367.patch
        21 kB
        Michael McCandless
      2. LUCENE-6367.patch
        31 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          mikemccand Michael McCandless added a comment -

          Patch, cutting over PrefixQuery to AutomatonQuery and removing
          PrefixTermsEnum.

          I explored the optimization of having Byte/CharRunAutomaton.run
          optimize (short-circuit) when you're in a sink state but it became
          quite difficult/invasive fixing all callers of .step to handle this.
          With LUCENE-5879 we also need to know the sink state under-the-hood,
          but that's separate from fixing .run to make use of it.

          So I backed out that opto and tried just doing the PrefixQuery cutover
          without optimizing for sink states. I'm running a perf test w/
          luceneutil and it looks like the impact is trivial (well within
          noise). Net/net I think it's fine to "just cutover" without the
          invasive opto?

          I also changed PrefixQuery's semantics to apply to full binary space
          terms, not just UTF-8 space. While this is technically a change in
          behavior, it won't impact users who index only unicode terms. It's
          also necessary for LUCENE-5879, because if prefixing is done only in
          unicode space (like today), then the resulting binary space automaton
          will not have a sink state and auto-prefix can't apply.

          If this part is somehow controversial I can revert and try to do it
          only with LUCENE-5879 instead... if it's OK, I'll add some tests
          showing that PrefixQuery on binary terms works.

          Show
          mikemccand Michael McCandless added a comment - Patch, cutting over PrefixQuery to AutomatonQuery and removing PrefixTermsEnum. I explored the optimization of having Byte/CharRunAutomaton.run optimize (short-circuit) when you're in a sink state but it became quite difficult/invasive fixing all callers of .step to handle this. With LUCENE-5879 we also need to know the sink state under-the-hood, but that's separate from fixing .run to make use of it. So I backed out that opto and tried just doing the PrefixQuery cutover without optimizing for sink states. I'm running a perf test w/ luceneutil and it looks like the impact is trivial (well within noise). Net/net I think it's fine to "just cutover" without the invasive opto? I also changed PrefixQuery's semantics to apply to full binary space terms, not just UTF-8 space. While this is technically a change in behavior, it won't impact users who index only unicode terms. It's also necessary for LUCENE-5879 , because if prefixing is done only in unicode space (like today), then the resulting binary space automaton will not have a sink state and auto-prefix can't apply. If this part is somehow controversial I can revert and try to do it only with LUCENE-5879 instead... if it's OK, I'll add some tests showing that PrefixQuery on binary terms works.
          Hide
          rcmuir Robert Muir added a comment -

          i think the boolean is fine. For the other subclasses, they are taking "patterns" and inspecting characters for that reason, so thats different.

          Show
          rcmuir Robert Muir added a comment - i think the boolean is fine. For the other subclasses, they are taking "patterns" and inspecting characters for that reason, so thats different.
          Hide
          mikemccand Michael McCandless added a comment -

          New patch, adding tests for binary terms to PrefixQuery.

          I also added a new Operations.getSingleton, to work for both unicode
          and binary automata, and fixed CompiledAutomaton to use that when it's
          trying to "simplify" the incoming automaton.

          Show
          mikemccand Michael McCandless added a comment - New patch, adding tests for binary terms to PrefixQuery. I also added a new Operations.getSingleton, to work for both unicode and binary automata, and fixed CompiledAutomaton to use that when it's trying to "simplify" the incoming automaton.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1669522 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1669522 ]

          LUCENE-6367: PrefixQuery now subclasses AutomatonQuery

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1669522 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1669522 ] LUCENE-6367 : PrefixQuery now subclasses AutomatonQuery
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1669526 from Michael McCandless in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1669526 ]

          LUCENE-6367: PrefixQuery now subclasses AutomatonQuery

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1669526 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1669526 ] LUCENE-6367 : PrefixQuery now subclasses AutomatonQuery
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1669577 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1669577 ]

          LUCENE-6367: correct CHANGES entry: PrefixQuery already operated in binary term space

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1669577 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1669577 ] LUCENE-6367 : correct CHANGES entry: PrefixQuery already operated in binary term space
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1669578 from Michael McCandless in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1669578 ]

          LUCENE-6367: correct CHANGES entry: PrefixQuery already operated in binary term space

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1669578 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1669578 ] LUCENE-6367 : correct CHANGES entry: PrefixQuery already operated in binary term space
          Hide
          thelabdude Timothy Potter added a comment -

          Bulk close after 5.1 release

          Show
          thelabdude Timothy Potter added a comment - Bulk close after 5.1 release

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development