Details

    • Type: Sub-task Sub-task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: query parsers
    • Labels:
      None

      Description

      A project I'm working on requires word order searching. The users are accustomed to Sphinx search, and expect a query like [ A << B ] to return only documents that contain the term A before the term B.

      I believe this can currently be done with the surround parser (SOLR-2703), but we lack an operator for it. It would be great to add it, so that word order searches can be combined by users into sophisticated queries.

      Note that this should also support a query like [ A << A], which would require that the term be in the document twice (the first instance before the second).

      This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and exact match).

        Issue Links

          Activity

          Hide
          Jan Høydahl added a comment -

          Suggested syntax:

          A NEAR/N B      - e.g. foo NEAR/5 bar will find bar within 5 positions from foo, same as "foo bar"~5
          A ONEAR/N B     - ordered near, finds B within 5 positions *after* A
          

          A question is if span queries allow us to do more complex proximity expressions like "a brown fox" NEAR ("blue fence" OR "green gate")

          Show
          Jan Høydahl added a comment - Suggested syntax: A NEAR/N B - e.g. foo NEAR/5 bar will find bar within 5 positions from foo, same as "foo bar"~5 A ONEAR/N B - ordered near, finds B within 5 positions *after* A A question is if span queries allow us to do more complex proximity expressions like "a brown fox" NEAR ("blue fence" OR "green gate")
          Hide
          Erik Hatcher added a comment -

          A question is if span queries allow us to do more complex proximity expressions like "a brown fox" NEAR ("blue fence" OR "green gate")

          Very much so. That query would be a SpanNearQuery of a SpanNearQuery("blue fence") and a SpanOrQuery of two SpanNearQueries.

          The surround query parser can do this (but does not analyze the terms currently) but a less friendly syntax.

          Show
          Erik Hatcher added a comment - A question is if span queries allow us to do more complex proximity expressions like "a brown fox" NEAR ("blue fence" OR "green gate") Very much so. That query would be a SpanNearQuery of a SpanNearQuery("blue fence") and a SpanOrQuery of two SpanNearQueries. The surround query parser can do this (but does not analyze the terms currently) but a less friendly syntax.
          Hide
          Jan Høydahl added a comment -

          Cool. Of course a proper parser framework with a grammar would make it easier to implement this correctly (LUCENE-1567), but with the current edismax we could try to support the simple A NEAR/N B as a start?

          Show
          Jan Høydahl added a comment - Cool. Of course a proper parser framework with a grammar would make it easier to implement this correctly ( LUCENE-1567 ), but with the current edismax we could try to support the simple A NEAR/N B as a start?
          Hide
          Robert Muir added a comment -

          I think traditionally the problem is you have to convert everything to spans to do this.
          (Spans can only wrap spans)

          This isn't a lossless conversion: e.g. for individual terms SpanNear has different
          semantics than SloppyPhraseQuery... but maybe thats ok as a caveat.

          On the other hand for the long term something like LUCENE-2878 would make such a task
          a lot easier.

          really its ridiculous the lucene queryparser doesn't have a NEAR operator.

          Show
          Robert Muir added a comment - I think traditionally the problem is you have to convert everything to spans to do this. (Spans can only wrap spans) This isn't a lossless conversion: e.g. for individual terms SpanNear has different semantics than SloppyPhraseQuery... but maybe thats ok as a caveat. On the other hand for the long term something like LUCENE-2878 would make such a task a lot easier. really its ridiculous the lucene queryparser doesn't have a NEAR operator.
          Hide
          Jan Høydahl added a comment -

          really its ridiculous the lucene queryparser doesn't have a NEAR operator.

          +1

          Show
          Jan Høydahl added a comment - really its ridiculous the lucene queryparser doesn't have a NEAR operator. +1
          Hide
          Mike added a comment -

          Jan, with the ONEAR, will there be a way to indicate that infinite distance between the terms is OK? E.g. I don't care how far they are from each other, so long as they're in this order?

          Show
          Mike added a comment - Jan, with the ONEAR, will there be a way to indicate that infinite distance between the terms is OK? E.g. I don't care how far they are from each other, so long as they're in this order?
          Hide
          Jan Høydahl added a comment -

          That's up to us to define. We could allow a special syntax as for range searches for this, e.g. ONEAR/*.

          We'd also need to define what should be the default N, if people write A NEAR B. Perhaps 25? Could be configurable through e.g. &q.near=N.

          Should we perhaps open a LUCENE Jira for the low-level part of this - which "blocks" this issue?

          Show
          Jan Høydahl added a comment - That's up to us to define. We could allow a special syntax as for range searches for this, e.g. ONEAR/* . We'd also need to define what should be the default N , if people write A NEAR B. Perhaps 25? Could be configurable through e.g. &q.near=N . Should we perhaps open a LUCENE Jira for the low-level part of this - which "blocks" this issue?
          Hide
          Hoss Man added a comment -

          bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

          Show
          Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
          Hide
          Robert Muir added a comment -

          rmuir20120906-bulk-40-change

          Show
          Robert Muir added a comment - rmuir20120906-bulk-40-change
          Hide
          Hoss Man added a comment -

          removing fixVersion=4.0 since there is no evidence that anyone is currently working on this issue. (this can certainly be revisited if volunteers step forward)

          Show
          Hoss Man added a comment - removing fixVersion=4.0 since there is no evidence that anyone is currently working on this issue. (this can certainly be revisited if volunteers step forward)

            People

            • Assignee:
              Unassigned
              Reporter:
              Mike
            • Votes:
              3 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 4h
                4h
                Remaining:
                Remaining Estimate - 4h
                4h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development