Hi Mark H.,
Thanks for the response, some comments inline:
Correct, the "inner phrase" example was a term not a phrase. This is perhaps a better example:
checkBadQuery("\"jo* \"percival smith\" \""); //phrases inside phrases is bad
I think you did not get what I meant, even with your new example, there is no inner phrase, it is: a phrase <"jo* ">, followed by a term <percival>, followed by another term <smith>, and an empty phrase <" ">. So, with your change, the junit passes, but for the wrong reason. It gets an exception complaining about the empty phrase and not because there is an inner phrase (I still don't see how you can type an inner phrase with the current syntax). I think it's not a big deal, but I'm just trying to understand and raise a probable wrong test. I expect you understood what I mean, let me know if I did not make it clear.
The Junit is currently the main form of documentation
But not the ideal, because the source code (junit code) is not released in the binary release. So, the ideal place should be in the javadocs.
- Wildcard/fuzzy/range clauses can be used to define a phrase element (as opposed to simply single terms)
- Brackets are used to group/define the acceptable variations for a given phrase element e.g. "(john OR jonathon) smith"
- "AND" is irrelevant - there is effectively an implied "AND_NEXT_TO" binding all phrase elements
Thanks, now it's clearer for me what is supported or not. I have some questions:
I understand this AND_NEXT_TO implicit operator between the queries inside the phrase. However, what happens if the user do not type any explicit boolean operator between two terms inside parentheses: "(query parser) lucene". Is the operator between 'query' and 'parser' the implicit AND_NEXT_TO or the default boolean operator (usually OR)?
What happens if I type "(query AND parser) lucene". In my point of view it is: "(query AND parser) AND_NEXT_TO lucene". Which means for me: find any document that contains the term 'query' and the term 'parser' in the position x, and the term 'lucene' in the position x+1. Is this the expected behaviour?
1) Keep in core and improve error reporting and documentation
2) Move into "contrib" as experimental
3) Retain in core but simplify it to support only the simplest syntax (as in my Britney~ example)
4) Re-engineer the QueryParser.jj to support a formally defined syntax for acceptable "within phrase" operators e.g. *, ~, ( )
1 is good, but I would prefer 4 too. Documentation and throw the right exception are necessary. I just don't feel confortable on the complex phrase query parser relying on the main query parser syntax, any change on the main one could easialy brake the complex phrase QP. Anyway, 4 may be done in future
With the new info from Mark H, how hard would it be to create a new imp for the new parser that did a lot of this, in a more defined way? It seems you basically just want to be able to use multiterm queries and group/or things, right? We could even relax a little if we have to. This hasn't been released, so there is still a lot of wiggle room I think. But there does have to be a resolution with this and the new parser at some point either way.
Yes, I am working on the new query parser code. I started recently to read and understand how the ComplexPhraseQP works, so I could reproduce the behaviour using the new QP framework. I first tried to look at this QP as a user and could not figure out what exactly I can or not do with it. I think now we are hitting a big problem, which is related to documentation. That is why I started raising these question, because others could also have the same issues in future.
So, yes, I can start coding some equivalent QP using the new QP framework, I'm just questioning and trying to understand everything before I start any coding. I don't wanna code anything that wil throw ConcurrentModificationExceptions, that's why I'm raising these issues now, before I start moving it to the new QP.
Adriano Crestani Campos