David - Thanks for the strong TDD example! Thanks a lot for that, srsly.
Ryan - Thanks to you for the quick fix.
I tried out the test patch first, got the failure, applied Ryan's patch, test passes. TDD by the book.
I've committed this to trunk, with the change history log of: "Now load URL content stream data (via stream.url) when called for during request handling, rather than loading URL content streams automatically regardless of use."
I think the security aspect of this is a separate issue. What we've done here is only load URL content (file, etc content streams I double-checked, they late load already as it should) when a component calls out for it. So someone could still send in that same evil stream.url to /analysis/document. Let's spin off another issue for something like "Enable fine grained control over allowed content streams", such that one could disable URL content streams, but leave local file content streams possible, say. Not sure that entirely satisfies this issue though, as it certainly is the case that one would have situations where stream.url to load content is really handy, but you certainly don't want any loopback (or fan-out) from malicious data to kill a system either. What do others think about how to address this appropriately on the Solr side (even if that means simply making it clearer what stream.url really does underneath)?