I just spent some time fiddling with streaming expressions for fun, reading Erick Erickson's blog (https://lucidworks.com/2017/12/06/streaming-expressions-in-solrj/) and the example given in the ref guide (https://lucene.apache.org/solr/guide/7_5/streaming-expressions.html#streaming-requests-and-responses) and it occurred to me that we are recommending string concatenation into an expression language with the power to harm the server, or other network services visible from the server. I'm starting this Jira as a security issue to avoid creating a public impression of insecurity, feel free to undo that if I have guessed wrong. I haven't developed an exploit example, but it would go something like this:
- Some portion of an expression is built including user supplied data using the techniques we're recommending in the ref guide
- Malicious user constructs input data that breaks out of the expression (SOLR-10894 is relevant here), probably somewhere inside a let() expression where one could simply define an additional variable taking the value of a malicious expression...
- update() expression is executed to add/overwrite data, jdbc() makes a JDBC connection to a database visible to the server, or the malicious expression executes some very expensive expression for DOS effect.
Technically this is of course the fault of the end user who allowed unchecked input into programmatic execution, but when I think about how to check the input I realize that the only way to be sure is to construct for myself a notion of exactly how the parser behaves and then determine what needs to be escaped. To do this I need to dig into the expression parser code...
How to escape input is also already unclear as shown by SOLR-10894
There's another important wrinkle that would easily be missed by someone trying to construct their own escaping/protection system relating to parameter substitution as discussed here:
I think the solution to this is that SolrJ API should be enhanced to provide an escaping utility at a minimum and possibly a "prepared expression" similar to SQL prepared statements and call this issue to attention in the ref guide once these tools are available...
Additionally, templating features might be a useful addition to help folks manage large expressions and facilitate re-use of patterns... such templating should also have this issue in mind when/if they are added.