Lucene - Core
  1. Lucene - Core
  2. LUCENE-4004

Add syntax for DisjunctionMaxQuery to the Query Parser

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.6
    • Fix Version/s: None
    • Component/s: core/queryparser
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I've come up with a use for DisjunctionMaxQuery, but not the dismax parser. I note that the toString() method on that item proposes a syntax with vertical bars. Is there any sympathy for a patch that added this to the standard parser, or some other syntax?

      1. LUCENE-4004.patch
        9 kB
        Benson Margulies
      2. LUCENE-4004.patch
        10 kB
        Benson Margulies

        Activity

        Hide
        Benson Margulies added a comment -

        I note that the xml parser doesn't do DisjunctionMaxQuery. I can't see a reason why this would be controversial, so I'll start with a patch for that.

        Show
        Benson Margulies added a comment - I note that the xml parser doesn't do DisjunctionMaxQuery. I can't see a reason why this would be controversial, so I'll start with a patch for that.
        Hide
        Robert Muir added a comment -

        I'm +1 to adding syntax to this to the classic/flexible/qps as well.

        But I like your idea of adding it to the xml one first. Maybe its an easier iteration to then boil down the best
        syntax for the other ones (which is going to be more difficult just because they are hairier).

        Show
        Robert Muir added a comment - I'm +1 to adding syntax to this to the classic/flexible/qps as well. But I like your idea of adding it to the xml one first. Maybe its an easier iteration to then boil down the best syntax for the other ones (which is going to be more difficult just because they are hairier).
        Hide
        Benson Margulies added a comment -

        Support D-m-q in the XML parser.

        Show
        Benson Margulies added a comment - Support D-m-q in the XML parser.
        Hide
        Benson Margulies added a comment -

        Indeed I don't quite know what sort of syntax would fly for the tieBreaker, I'm off to study the jj file for the classic parser and see if anything analogous presents itself.

        Meanwhile, do you want to do XML in a different JIRA from the hard ones, or should I just stack up the patches here?

        Show
        Benson Margulies added a comment - Indeed I don't quite know what sort of syntax would fly for the tieBreaker, I'm off to study the jj file for the classic parser and see if anything analogous presents itself. Meanwhile, do you want to do XML in a different JIRA from the hard ones, or should I just stack up the patches here?
        Hide
        Benson Margulies added a comment -

        add the boost param I forgot the first time.

        Show
        Benson Margulies added a comment - add the boost param I forgot the first time.
        Hide
        Benson Margulies added a comment -

        Using the syntax ( q | q | q ) might be doable in javacc, but I worry that it's undesirable.

        Right now, what's in parens are boolean clauses (with +/-). The insides of a disjunct aren't boolean clauses, they are queries. This could be pretty confusing all around. It would be really better to introduce some syntax that allows for various sorts of grouping, but I don't want to step on the Solr parser's use of {}. Further, DisjunctionMaxQuery is just one thing, and using up | for it seems ill-advised.

        % isn't doing anything for a living, so that an option would be

        (%disjunctionmax q q q q q ) would serve, and also open a door to supporting other things.

        ?

        Show
        Benson Margulies added a comment - Using the syntax ( q | q | q ) might be doable in javacc, but I worry that it's undesirable. Right now, what's in parens are boolean clauses (with +/-). The insides of a disjunct aren't boolean clauses, they are queries. This could be pretty confusing all around. It would be really better to introduce some syntax that allows for various sorts of grouping, but I don't want to step on the Solr parser's use of {}. Further, DisjunctionMaxQuery is just one thing, and using up | for it seems ill-advised. % isn't doing anything for a living, so that an option would be (%disjunctionmax q q q q q ) would serve, and also open a door to supporting other things. ?
        Hide
        Robert Muir added a comment -

        Further, DisjunctionMaxQuery is just one thing, and using up | for it seems ill-advised.

        I think we allow this for OR as well anyway, so it would be ambiguous...?

        Show
        Robert Muir added a comment - Further, DisjunctionMaxQuery is just one thing, and using up | for it seems ill-advised. I think we allow this for OR as well anyway, so it would be ambiguous...?
        Hide
        Benson Margulies added a comment -

        FWIW, that's ||, not |, but I still don't want to burn | on disjunction.

        Show
        Benson Margulies added a comment - FWIW, that's ||, not |, but I still don't want to burn | on disjunction.
        Hide
        Benson Margulies added a comment -

        Note that the patch is to trunk, I'm sort of assuming that you all do 'patch the trunk and then backport as appropriate'.

        Show
        Benson Margulies added a comment - Note that the patch is to trunk, I'm sort of assuming that you all do 'patch the trunk and then backport as appropriate'.
        Hide
        Robert Muir added a comment -

        Usually yes, however at this stage we just released 3.6 (intended to be the last minor release in the 3.x series).

        So currently we have not yet cut a branch_4x for stable 4.0 and are only working on trunk.

        (separately, we also have http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/ open for bugfixes, to backport any nasty bugs for 3.6.1, 3.6.2, etc: basically 3.x is intended to be in maintenance mode like 2.9.x was)

        Show
        Robert Muir added a comment - Usually yes, however at this stage we just released 3.6 (intended to be the last minor release in the 3.x series). So currently we have not yet cut a branch_4x for stable 4.0 and are only working on trunk. (separately, we also have http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/ open for bugfixes, to backport any nasty bugs for 3.6.1, 3.6.2, etc: basically 3.x is intended to be in maintenance mode like 2.9.x was)
        Hide
        Benson Margulies added a comment -

        At the risk of drifting off topic ...

        From where I sit, 4.0 looks a long way from a release. Lots of things with no javadoc, with the experimental tag, etc. (I've mostly been reading in Solr.) Am I underestimating the speed at which the entire lodge of beavers pulls a major release together?

        Show
        Benson Margulies added a comment - At the risk of drifting off topic ... From where I sit, 4.0 looks a long way from a release. Lots of things with no javadoc, with the experimental tag, etc. (I've mostly been reading in Solr.) Am I underestimating the speed at which the entire lodge of beavers pulls a major release together?
        Hide
        Robert Muir added a comment -

        What can I say: documentation is always a weakness.

        From the Solr side, I think most of the documentation tends to be higher level on the wiki, and javadocs
        don't get as much attention since its a smaller population hacking on the APIs.

        From the Lucene side, things tend to be under-documented or out of date.

        In any case, just throw up patches and lets get em in. Nobody gets excited to work on docs too much around here unfortunately, probably because there is no momentum and the existing docs are all out of date and not very good.

        On the other hand, just in the past weekend we've made some serious progress on the lucene documentation already: nuked the old out-of-date-forrest stuff, fixed a ton of broken links, added a broken-link checker, linked the javadocs of various modules to each other, brought all the existing docs (minus fileformats) up to speed with 4.0, generating htmlized version of the .txt documents with markdown, etc.

        I know there is a lot of underdocumented stuff in lucene, but currently from my own perspective I am working to correct the broken stuff

        For some reason, its definitely harder to fix the old documentation up to make sense than i would have ever thought, I spent most of the day just bringing http://lucene.apache.org/core/3_6_0/scoring.html up to speed and integrating it into the o.a.l.search package javadocs.

        For the stuff you see thats experimental with no javadoc tag... this is really just as bad of a problem, just open issues if you want to help out. We are pretty overwhelmed with things to fix on the documentation side so any help would be appreciated.

        Show
        Robert Muir added a comment - What can I say: documentation is always a weakness. From the Solr side, I think most of the documentation tends to be higher level on the wiki, and javadocs don't get as much attention since its a smaller population hacking on the APIs. From the Lucene side, things tend to be under-documented or out of date. In any case, just throw up patches and lets get em in. Nobody gets excited to work on docs too much around here unfortunately, probably because there is no momentum and the existing docs are all out of date and not very good. On the other hand, just in the past weekend we've made some serious progress on the lucene documentation already: nuked the old out-of-date-forrest stuff, fixed a ton of broken links, added a broken-link checker, linked the javadocs of various modules to each other, brought all the existing docs (minus fileformats) up to speed with 4.0, generating htmlized version of the .txt documents with markdown, etc. I know there is a lot of underdocumented stuff in lucene, but currently from my own perspective I am working to correct the broken stuff For some reason, its definitely harder to fix the old documentation up to make sense than i would have ever thought, I spent most of the day just bringing http://lucene.apache.org/core/3_6_0/scoring.html up to speed and integrating it into the o.a.l.search package javadocs. For the stuff you see thats experimental with no javadoc tag... this is really just as bad of a problem, just open issues if you want to help out. We are pretty overwhelmed with things to fix on the documentation side so any help would be appreciated.
        Hide
        Robert Muir added a comment -

        By the way this patch looks good. Thanks Benson! I plan to commit this in a bit

        Show
        Robert Muir added a comment - By the way this patch looks good. Thanks Benson! I plan to commit this in a bit
        Hide
        Robert Muir added a comment -

        Committed revision 1328981 for the xml queryparser support. Thanks again.
        We can either keep this issue open for the other QPs, or spin off new issues,
        whichever you prefer.

        Show
        Robert Muir added a comment - Committed revision 1328981 for the xml queryparser support. Thanks again. We can either keep this issue open for the other QPs, or spin off new issues, whichever you prefer.
        Hide
        Benson Margulies added a comment - - edited

        Please see https://issues.apache.org/jira/browse/LUCENE-4012 for an alternative to adding syntax to any of the existing end-user-facing parsers. I think it makes more sense, myself, but if others see value in continuing the line in here I'm game.

        Show
        Benson Margulies added a comment - - edited Please see https://issues.apache.org/jira/browse/LUCENE-4012 for an alternative to adding syntax to any of the existing end-user-facing parsers. I think it makes more sense, myself, but if others see value in continuing the line in here I'm game.

          People

          • Assignee:
            Robert Muir
            Reporter:
            Benson Margulies
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development