Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Implemented
-
None
-
None
-
None
Description
Currently solr binds to all interfaces by default.
The default should be safer, so that e.g. the user is not exposed to the internet until they make an explicit step to do so.
Attachments
Attachments
- SOLR-13985.patch
- 12 kB
- Jason Gerlowski
- SOLR-13985.patch
- 11 kB
- Jason Gerlowski
- SOLR-13985.patch
- 11 kB
- Jason Gerlowski
- SOLR-13985.patch
- 2 kB
- Robert Muir
Issue Links
- incorporates
-
SOLR-8382 Allow configuration of the bind IP(host) in solr 5
- Closed
- is related to
-
SOLR-7976 Jetty http and https connectors use different property names for host/port
- Open
-
SOLR-7977 SOLR_HOST in solr.in.sh doesn't apply to Jetty's host property
- Resolved
-
SOLR-14118 default embedded zookeeper port to localhost
- Closed
- relates to
-
SOLR-13972 Insecure Solr should generate startup warning
- Closed
- links to
Activity
I'm +1 to the proposal but I doubt the patch is ready. Solr should maybe do an extra log from bin/solr if the default bind of localhost isused to alert users to this, who may not be used to it. Maybe bin/solr should have a convenient option to bind to something else. There should be an env variable to set this. And lastly as you mentioned – docs, particularly the Solr Ref Guide.
I do wonder if there are popular Solr deployment patterns used in which the traffic appears to come from localhost when it's actually not. It would be good to know these; maybe mention in the documentation.
If there is an easy variable to change this, i would really like it to be organized beside SSL and basic auth if possible. Similar stuff applies to docs. To be most effective it needs to entice the user to secure the stuff
uschindler You almost had a patch ready to replace Jetty's start.jar with a solr.jar which moves all jetty xml configs into our own Java class instead. Is this a good time to pick it up again, would think that it would give us full control of what to bind to as well? I don't think it is as risky as it sounds like. We just do the Jetty init and servlet wirings from code instead of from xml. We already to this for our tests.
EDIT: Ok I just saw SOLR-13984 which is exactly about this
Looking at this patch, it looks like we have the Jetty Host set inconsistently across our http and https jetty.xml files:
solr/server/etc/jetty-http.xml: <Set name="host"><Property name="jetty.host" /></Set> solr/server/etc/jetty-https.xml: <Set name="host"><Property name="solr.jetty.host" /></Set> solr/server/etc/jetty-https8.xml: <Set name="host"><Property name="solr.jetty.host" /></Set>
"jetty.host" vs "solr.jetty.host". Assuming that's not done intentionally, we should probably correct that on our way through here.
Should we name the SOLR_JETTY_HOST something else, such as SOLR_BIND_HOST or SOLR_BIND_IP? I like how Elasticsearch accepts special values _en0_, _local_, _site_ and _global_ as an alternative to knowing the IP address up front. You may only know the hostname, but such convenience settings could come later.
In your patch you still have 0.0.0.0 set in one of the solr.in files.
You have duplicated the same paragraphs in securing-solr.adoc and taking-solr-to-production.adoc.
I'm assigning this to myself so I can move this forward a bit. If I'm "stealing" this from you rcmuir, let me know and it's all yours
The latest patch has bin/solr, bin/solr.cmd logic to read a SOLR_JETTY_HOST value if set in solr.in.sh/solr.in.cmd.
It also takes a first pass at docs for this. I've added larger blurbs about this on the "Taking Solr to Production" and "Securing Solr" pages. I added a smaller warning-style note on the "Getting Started with SolrCloud" page that talks about the need to loosen this setting to allow Solr nodes to talk to each other. Presumably there's a lot of other places in the docs that might benefit from a similar note. I'm not sure how much is overdoing it though.
This seems like a change that will impact a lot of deployments so maybe we should target 9.0 for this. You could argue that the security benefits are important enough to trump our breaking-change policy - I don't think I really buy that yet, but I'm open to the argument if someone wants to make it.
I have not tested the Windows changes yet. Hoping to set up a VM to do so soon, but if anyone else has a Windows environment handy, I'd appreciate a double check there.
Anyone have thoughts?
If I'm "stealing" this from you Robert Muir, let me know and it's all yours
Please, steal. You clearly have a better idea of how to get it going!
Hey, review comments before I could post my description of the patch. Thanks for the quick feedback Jan.
you still have 0.0.0.0 set in one of the solr.in files.
Leftover from testing. Fixed
You have duplicated the same paragraphs in securing-solr.adoc and taking-solr-to-production.adoc.
That was intentional, but I'm not happy about it and would love any suggestions you had. The information in those 2-3 paragraphs seemed relevant in both places. Initially I put a link from taking-solr-to-production.adoc to the material in securing-solr.adoc, but it ended up that I was taking a sentence or two to provide a link to a sentence or two. Seemed a little weird, so I just duplicated the paragraphs. I'm happy to go back to linking to it though if you prefer.
Should we name the SOLR_JETTY_HOST something else, such as SOLR_BIND_HOST or SOLR_BIND_IP?
I chose SOLR_JETTY_HOST because it mirrored the values already in our jetty.xml's. But I don't have any particular attachment to the name if there's consensus on one of the others. I'm not familiar with those Elastic settings, but I'll take a look and get back to you.
I'd perhaps prefer a shorter text in taking-solr-to-production with a link to the securing-solr page? Alternatively it should be possible to tag the paragraphs in securing-solr in some way so that you can include them with a reference in taking-solr-to-production. Then there is no duplication of text. You'll find examples of this elsewhere in the guide.
gerlowskija FYI this conversation thread on our dev list recently may be pertinent to the naming and distinction of some of our host/IP settings.
+1 for keeping it simple and well-documented for 9.0. I like the patch here. It is good for this kind of change to be in a major release.
With regards to the documentation, I think its good to echo it (maybe via link as Jan suggests) in multiple places in the documentation so that its 100% clear, easily seen, and doesn't present roadblocks to people.
+IF NOT DEFINED SOLR_JETTY_HOST ( + set "SOLR_OPTS=%SOLR_OPTS% -Dsolr.jetty.host=%SOLR_JETTY_HOST%" +)
This looks backwards to me, should it be instead IF DEFINED? Sorry, I don't know windows, but it seems backwards from test -n. I can test the patch on a windows VM if needed.
Uploading a patch to address some of the feedback given so far. This mostly fixes some duplication in the docs and some bugs in the Windows batch script.
[re: docs duplication] Alternatively it should be possible to tag the paragraphs in securing-solr in some way so that you can include them with a reference in taking-solr-to-production
This is the path I ended up going down. Worked out pretty nicely.
should it be instead IF DEFINED
Yep, fixed.
On Naming
Jan pointed out that we could come up with a better name than SOLR_JETTY_HOST. David pointed out that we have a smattering of other properties with at least some overlap ("host" in solr.xml, for one), and that we should aim for consistency in how these are set, and better documentation around what each does.
These are both good points, and confusing pieces of config that could benefit from straightening out. I straightened out the "jetty.host" vs "solr.jetty.host" http/https inconsistency. But in the interest of not letting the perfect get in the way of the good, I'd rather move that investigation/untangling for a separate jira. I'll file a follow-up jira to re-examine the names and documentation around these host/port config values, and try to make progress on it over the holidays. I'm not sure I have enough time to fully untangle these additional settings and put together fixes, but I'll try to put together at least a good writeup so someone else can tackle it if I'm not able to.
Moving Forward
I just finished testing this. It looks good on both Windows and Linux. Any last qualms or votes against merging this? It's a pretty high impact change for users, but it addresses a real security need and the documentation makes it pretty clear how to change the default when necessary. I'll aim to commit after the holidays if there's no additional concerns/feedback. I might send out a dev mail to let others know this is happening as well.
Commit 479e7364696ab496726b595fc1156de3c4b0251a in lucene-solr's branch refs/heads/master from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=479e736 ]
SOLR-13985: Bind to localhost interface by default
Prior to this commit, Solr's Jetty listened for connections on all
network interfaces. This commit changes it to only listen on localhost,
to prevent incautious administrators from accidentally exposing their
Solr deployment to the world.
Administrators who wish to override this behavior can set the
SOLR_JETTY_HOST property in their Solr include file
(solr.in.sh/solr.in.cmd) to "0.0.0.0" or some other value.
Thanks for the reviews and feedback everyone. Just merged this to master a few minutes ago.
I might have found an issue with this. I'm going to temporarily revert out of an abundance of caution while I investigate. Hopefully I'll re-merge this later today.
Commit a17c48642447ad1c2e6121e82e1e92eeaa5cd82a in lucene-solr's branch refs/heads/master from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a17c486 ]
Revert "SOLR-13985: Bind to localhost interface by default"
This temporarily reverts commit 479e73 while a potentially related
networking hiccup is investigated.
The issue I found is a real problem. You can see it trivially with this commit in place by running bin/solr start -c && bin/solr create -c foo. Collection creation fails with this error in the logs:
2020-01-07 14:26:14.582 INFO (OverseerStateUpdate-72132527041150976-192.168.1.194:8983_solr-n_0000000000) [ ] o.a.s.c.o.SliceMutator createReplica() { "operation":"ADDREPLICA", ... "base_url":"http://192.168.1.194:8983/solr"} 2020-01-07 14:26:14.790 ERROR (OverseerThreadFactory-9-thread-3-processing-n:192.168.1.194:8983_solr) [ ] o.a.s.c.a.c.OverseerCollectionMessageHandler Error from shard: http://192.168.1.194:8983/solr => org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://192.168.1.194:8983/solr at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:672) org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://192.168.1.194:8983/solr at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:672) ~[?:?] at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:265) ~[?:?] at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) ~[?:?] at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) ~[?:?] at org.apache.solr.handler.component.HttpShardHandlerFactory$1.request(HttpShardHandlerFactory.java:178) ~[?:?] at org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:195) ~[?:?]
The issue is pretty clear. Solr only accepts connections on localhost, but puts a public IP address in live_nodes, overseer messages, etc. So when Solr goes to make requests to itself, those requests will fail. This is a pretty big problem and one I should have caught earlier. But no harm no foul hopefully.
As a hack, the problem can be worked around by setting SOLR_HOST="127.0.0.1" in solr.in.sh. Maybe we could auto-set SOLR_HOST to 127.0.0.1 in bin/solr if localhost-only binding is configured. But that seems a bit brittle to me: how would conflicts be handled, etc? I'll do some more testing on this today to try and figure out whether this is a reasonable solution.
Anyone have any thoughts?
The issue is pretty clear. Solr only accepts connections on localhost, but puts a public IP address in live_nodes, overseer messages, etc.
This "self-advertisement" seems like not a great design, it will make things difficult on users. How/what logic is picking such 192.168.1.194 IP today? And I assume whatever default behavior is magically coming up with "192.168.1.194" breaks currently all the time across different configurations (e.g. reverse proxy in front), and the user must manually override with special "publish" configuration?
It is good that you found the issue here, because we should be encouraging such safe configurations which are probably difficult today (e.g. nodes talking to each other over private network infra and not all exposed directly to the internet).
Ideally this could be removed completely, and instead if nodeB talks to nodeA, nodeA knows how to talk to nodeB by looking at the IP address that nodeB used, "call be back on the phone number I called you on". It would give good default behavior, and you wouldn't need to specify any "publish" stuff unless you were doing something screwed-up (like exposing everything to the internet).
Also I haven't looked at the code, so perhaps it is really doing the right thing, and its not bad protocol design (just like same issue with FTP protocol).
If we think code is doing the right thing, perhaps the problem is that embedded zookeeper still binds to every interface by default? You can try the patch on SOLR-14118 which would really force localhost-only traffic.
The biggest question to me is: How does Solr get it's own IP address? How does it handle multiple addresses? Does it prefer IPv6 (I hope so)?
Sorry, too much automatism is a no-go. When you register a new node it should ask the user for its IPv4 or IPv6 address that gets published to zookeeper. Only in the localhost-only case it should send its own local address!
My machines e.g. at Hetzner all have multiple IP addresses and in the IPv6 world I use a separate address for every single service. So you can easily test thousands of nodes on a single machine all with different IPv6 addresses.
This "self-advertisement" seems like not a great design
When you register a new node it should ask the user for its IPv4 or IPv6 address
It seems like you guys are both advocating that Solr should almost never guess its own IP? Personally I don't think that Solr attempting to guess its own IP/hostname is all that terrible. It works much of the time, and there are several ways for users to override it if necessary (-Dhost, -host, SOLR_HOST). But that's a very loosely held opinion.
No doubt there are improvements to IP detection and handling that we could make. But unless we do away with the user-override option entirely we'll still have these two related settings (SOLR_JETTY_HOST and SOLR_HOST) that we need to do a better job of lining up by default. That's what forced my revert here: I changed the default solr.jetty.host value and that moved it out of line with SOLR_HOST. As I dug more this morning I became increasingly convinced that's the simplest way to fix. So pending objections that's the path I'm going to start testing.
I buy that node registration/IP-picking improvements are possible/needed/important, they just seem like separate JIRAs to me.
I think the user should always be able to override with two settings. But I think two settings can be REALLY confusing for users and it should only be necessary for advanced cases.
I agree, we should start simple and make other JIRAs for improving the networking so that it works better.
OK, awesome. I've opened a PR for this with the *nix half already in place. That's ready to review if anyone is interested while I figure out the Windows changes.
I put some detail on the PR about the testing I did on it; happy for suggestions there too if there's a scenario anyone thinks of that I missed.
It's worth noting that none of our tests caught this issue because they all start Jetty differently than a real-deal Solr does, so these settings don't come into play in the same way. I'm not sure there's anything practical we can do about this, but I wonder whether this difference between test-land and reality has bitten us before? Just thinking aloud...
Commit a17c48642447ad1c2e6121e82e1e92eeaa5cd82a in lucene-solr's branch refs/heads/gradle-master from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a17c486 ]
Revert "SOLR-13985: Bind to localhost interface by default"
This temporarily reverts commit 479e73 while a potentially related
networking hiccup is investigated.
Pushed Windows changes up to the PR now as well. If anyone has a chance to test it out, I'd appreciate it. Especially on Windows- I'm never confident that I didn't accidentally munge the line endings with my git settings somehow.
Commit 5377742a62e58c79055f3a2676b77e1ed1d61823 in lucene-solr's branch refs/heads/master from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5377742 ]
SOLR-13985: Bind to localhost interface by default (#1154)
Prior to this commit, Solr's Jetty listened for connections on all
network interfaces. This commit changes it to only listen on localhost,
to prevent incautious administrators from accidentally exposing their
Solr deployment to the world.
Administrators who wish to override this behavior can set the
SOLR_JETTY_HOST property in their Solr include file
(solr.in.sh/solr.in.cmd) to "0.0.0.0" or some other value.
A version of this commit was previously reverted due to inconsistency
between SOLR_HOST and SOLR_JETTY_HOST. This commit fixes this issue.
Commit 5377742a62e58c79055f3a2676b77e1ed1d61823 in lucene-solr's branch refs/heads/gradle-master from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5377742 ]
SOLR-13985: Bind to localhost interface by default (#1154)
Prior to this commit, Solr's Jetty listened for connections on all
network interfaces. This commit changes it to only listen on localhost,
to prevent incautious administrators from accidentally exposing their
Solr deployment to the world.
Administrators who wish to override this behavior can set the
SOLR_JETTY_HOST property in their Solr include file
(solr.in.sh/solr.in.cmd) to "0.0.0.0" or some other value.
A version of this commit was previously reverted due to inconsistency
between SOLR_HOST and SOLR_JETTY_HOST. This commit fixes this issue.
Commit 5377742a62e58c79055f3a2676b77e1ed1d61823 in lucene-solr's branch refs/heads/jira/SOLR-13892 from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5377742 ]
SOLR-13985: Bind to localhost interface by default (#1154)
Prior to this commit, Solr's Jetty listened for connections on all
network interfaces. This commit changes it to only listen on localhost,
to prevent incautious administrators from accidentally exposing their
Solr deployment to the world.
Administrators who wish to override this behavior can set the
SOLR_JETTY_HOST property in their Solr include file
(solr.in.sh/solr.in.cmd) to "0.0.0.0" or some other value.
A version of this commit was previously reverted due to inconsistency
between SOLR_HOST and SOLR_JETTY_HOST. This commit fixes this issue.
It seems to me, after the revert a new version has been committed but the Jira is still in REOPENED status. Do we need some other changes?
A second question: Isn't it too strict to bind to localhost? It means that solr will not work out-of-the-box if I have multiple solr nodes. Or the security concerns are more important?
This issue should be the one where everyone argues with me: its gonna be an enormous drain on documentation and the mailing list.
but it prevents idiots from exposing stuff without taking addl steps. So its not enough to make the defaults change here, there needs to be a "path" through the docs whereas users securely enable their solr instance to talk to the outside world.
Happy to help in any way I can! But we really gotta do this.