Hi, I'm one of the httpd devs but I thought I'd throw in this patch for Solr 1.3 (I'll try to make one for trunk later) which handles a number of the issues raised in this report for us.
First, & and | are escaped, and the dismax logic is changed a little so that if the various query-munging methods return a blank string, we fall back to using the configured default query.
Next, consecutive + or - chars are flattened to a single char; this handles cases where a user might accidentally type --foo when they just mean -foo.
Strings of mixed + and - chars are removed, since we have no way of knowing the user's intent without something like +-foo or similar.
Together these two steps handle one of the reported cases where the query starts with multiple + or - operators.
Any remaining + or - chars which trail the last term, or which have whitespace on their right side, are removed. Our users found it puzzling in the extreme that a search on "questions 1 - 10" explicitly excluded results with "10" in them, because "- 10" is treated as -10. So we just remove any + or - operators which aren't right up against the following term.
Finally, we escape AND, OR, and NOT when they appear outside of quotes, and remove any trailing unmatched quote. This changes the previous behaviour which removes all quotes if they aren't perfectly balanced; we felt this was more in line with what users expect if they mistype and enter an extra quote char.
So far I haven't been able to generate any Lucene query parser exceptions with this code, but it doesn't mean it's perfect, obviously – there may still be some way to slip an invalid Lucene query past it. But I'm cautiously optimistic that it covers all or most of the issues raised so far in the thread.