Lucene - Core
  1. Lucene - Core
  2. LUCENE-4040

Improve QueryParser and supported syntax documentation

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: modules/queryparser
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      In LUCENE-4024 there were some changes to the fuzzy query syntax. Only the Classic QueryParser really documents its syntax, which makes it hard to know whether the changes effected other QPs. Compounding this issue there are many classes which have no javadocs at all and I found myself quite confused when I consolidated all the QPs into their module.

      We should do a concerted effort to improve the documentation so that it is clear what syntax is supported by what QPs and so that at least the user facing classes have javadocs.

      As part of this, I wonder whether we should give the syntax supported by the Classic QueryParser a new name (rather than just Lucene's query syntax) since other QPs can and do support other syntax, and then somehow add some typed control over this, so QPs have to declare programmatically that they support the syntax and so we can verify that by randomly plugging in implementations into tests.

      1. LUCENE-4040.patch
        2 kB
        Mike Sokolov
      2. LUCENE-4040.patch
        2 kB
        Mike Sokolov

        Activity

        Chris Male created issue -
        Hide
        Mike Sokolov added a comment -

        There is a real need for this. The surround parser in particular is very poorly documented. There are some quirks that aren't at all apparent, like: distance has a max of 99, and when distance > 99, you actually get a generic syntax error because only two digits are allowed in the token. Also: the default distance is one, but 1 is actually not allowed as an explicit distance - you can only specify it implicitly. Finally the distance is off-by-one from the slop parameter in the Span queries it ultimately generates. Distance of 0 in Spans == distance of 1 in the surround query syntax. And aside from all of these specific issues, there is just a general lack of any sort of general statement about what the allowable syntax is. The best documentation is the javacc source, of course, but that's a bit unapproachable for a lot of folks that might actually use this, I think.

        Show
        Mike Sokolov added a comment - There is a real need for this. The surround parser in particular is very poorly documented. There are some quirks that aren't at all apparent, like: distance has a max of 99, and when distance > 99, you actually get a generic syntax error because only two digits are allowed in the token. Also: the default distance is one, but 1 is actually not allowed as an explicit distance - you can only specify it implicitly. Finally the distance is off-by-one from the slop parameter in the Span queries it ultimately generates. Distance of 0 in Spans == distance of 1 in the surround query syntax. And aside from all of these specific issues, there is just a general lack of any sort of general statement about what the allowable syntax is. The best documentation is the javacc source, of course, but that's a bit unapproachable for a lot of folks that might actually use this, I think.
        Hide
        Chris Male added a comment -

        You raise an interesting point, some of the error messaging from the QPs is poor. I've been in a situation where users were able to express complex queries themselves but would often be confused by the error messages they received if a query didn't parse. Some of this is related to the parsing frameworks we use, some of it is that we could just do better. I think we'll tackle this in another issue but it's definitely part of the overall goal to give the QPs a big facelift.

        Would you be able to tackle the surround parser documentation? You seem to have experience using it (I haven't) and understand its quirks. Just throw up a patch and we'll iterate.

        Show
        Chris Male added a comment - You raise an interesting point, some of the error messaging from the QPs is poor. I've been in a situation where users were able to express complex queries themselves but would often be confused by the error messages they received if a query didn't parse. Some of this is related to the parsing frameworks we use, some of it is that we could just do better. I think we'll tackle this in another issue but it's definitely part of the overall goal to give the QPs a big facelift. Would you be able to tackle the surround parser documentation? You seem to have experience using it (I haven't) and understand its quirks. Just throw up a patch and we'll iterate.
        Hide
        Mike Sokolov added a comment - - edited

        I attached a small documentation patch for the surround parser. It isn't comprehensive, but gives a brief summary of all the operators as far as I understand them.

        Show
        Mike Sokolov added a comment - - edited I attached a small documentation patch for the surround parser. It isn't comprehensive, but gives a brief summary of all the operators as far as I understand them.
        Mike Sokolov made changes -
        Field Original Value New Value
        Attachment LUCENE-4040.patch [ 12526293 ]
        Hide
        Chris Male added a comment -

        Hey Mike,

        I think your patch is created against an older version of trunk since it contains modules/ which we no longer have. Could you update and create it again?

        Show
        Chris Male added a comment - Hey Mike, I think your patch is created against an older version of trunk since it contains modules/ which we no longer have. Could you update and create it again?
        Hide
        Mike Sokolov added a comment -

        Oops - it's been a while since I updated. Here's a patch against today's trunk.

        Show
        Mike Sokolov added a comment - Oops - it's been a while since I updated. Here's a patch against today's trunk.
        Mike Sokolov made changes -
        Attachment LUCENE-4040.patch [ 12526328 ]
        Hide
        Robert Muir added a comment -

        Thanks Mike, I committed this!

        Show
        Robert Muir added a comment - Thanks Mike, I committed this!

          People

          • Assignee:
            Unassigned
            Reporter:
            Chris Male
          • Votes:
            2 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development