Description
Zookeeper 3.4.10 release should be happening fairly soon, and the ZK issue blocking incorporation into Solr (ZOOKEEPER-2383) has a 3.4.10-targetted patch that fixes the test failures problem noted on SOLR-8724.
Attachments
Attachments
- zookeeper-3.4.9-upgrade-tests-fail.patch
- 11 kB
- Shawn Heisey
- zookeeper-3.4.8-upgrade-tests-pass.patch
- 10 kB
- Shawn Heisey
- SOLR-9386-fix-clientPort-parsing.patch
- 1.0 kB
- Steven Rowe
- SOLR-9386.patch
- 9 kB
- Steven Rowe
- SOLR-9386.patch
- 10 kB
- Steven Rowe
- SOLR-9386.patch
- 11 kB
- Steven Rowe
Issue Links
- is duplicated by
-
SOLR-9967 Upgrade the usage of zookeeper in our code
- Closed
- relates to
-
SOLR-8724 Upgrade Zookeeper to 3.4.8
- Resolved
-
ZOOKEEPER-2383 Startup race in ZooKeeperServer
- Closed
Activity
There is a ZK 3.4.9 release discussion thread here: http://markmail.org/message/hwfjfl5n5kitnzxy
Unfortunately it looks like ZOOKEEPER-2383 won't be included in the 3.4.9 release, which is now in the process of being produced.
With an upgrade to 3.4.8, and some tweaking in SolrZKServer to remove code copied from zookeeper and let ZK itself handle the embedded server property parsing, tests pass.
Upgrading to 3.4.9, I find that they have removed the convertLong method from ZKDatabase (without deprecation), which means that the overridden method must be removed from TestConfigSetAPIZkFailure.ForwardingZKDatabase. When I do that, everything compiles, but a couple of tests fail. I have no idea how to fix that problem.
I will add a couple of patches.
Patch upgrading to 3.4.9. I've adjusted everything as best I can, but tests in TestConfigSetsAPIZkFailure are failing with this patch. I do not know what's wrong.
I'm confused, elyograg - did you look at the patch I attached to this issue that made the same changes as you did?
No, I hadn't looked at the patch. I was already well into what I was doing on my own, then I figured I should look for an existing issue, and found this. I uploaded my patches so they wouldn't get lost, then drove home from work, and I've just now sat back down at a computer.
It looks like the linked ZooKeeper issue is targeted for 3.4.10 and missed the 3.4.9 release?
Updated the JIRA title and description to mention 3.4.10 instead of 3.4.9
Is it rude of my to suggest we look at SOLR-8346 and focus efforts on Zookeeper 3.5?
There are a number of DNS issues fixed since 3.4.8 apparently that would be really awesome for our Production Solr Clusters
https://issues.apache.org/jira/browse/ZOOKEEPER-1576 fixed in 3.5.0
https://issues.apache.org/jira/browse/ZOOKEEPER-2171 fixed in 3.5.1 (dupe of https://issues.apache.org/jira/browse/ZOOKEEPER-2367)
Alternatively is there any way to request those bugs be back ported?
We first need the Apache ZooKeeper project to release a stable 3.5 version. They only have alpha releases for 3.5.x, the latest being 3.5.2-alpha.
ZK 3.4.10 was released and includes a fix for ZOOKEEPER-2383: https://zookeeper.apache.org/releases.html#news
Updated patch, upping ZK dep to 3.4.10.
Compilation succeeds, and I looked at 3.4.10's sample zoo.cfg to see if there were changes we should include (there weren't any new ones).
I'm going to run all tests and precommit. Assuming nothing goes wrong, I want to commit this so it gets included in Solr 6.6.
precommit noticed I hadn't refreshed the zk jar's .sha1 file; this patch fixes the problem.
With this version of the patch, precommit and all Solr tests pass.
Committing shortly.
Commit 303c2a083e27cba876b2e7abc05101f241388b18 in lucene-solr's branch refs/heads/branch_6x from steve_rowe
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=303c2a0 ]
SOLR-9386: Upgrade Zookeeper to 3.4.10
Commit 57f17b111842729552f390dc653ffbaad0b4d658 in lucene-solr's branch refs/heads/master from steve_rowe
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=57f17b1 ]
SOLR-9386: Upgrade Zookeeper to 3.4.10
steve_rowe, could we completely remove that parseProperties method from Solr code and just let ZK handle it? I see that there's still some excedption handling code there after your change, but IMHO we should let ZK handle any problems or throw relevant exceptions.
There's another issue where somebody wanted to use a config option in the embedded zookeeper supported by 3.4 but not the the 3.2 version the parseProperties method was copied from ... so it didn't work. I can't seem to locate that issue now. I'm pretty sure that in the patch for that issue, I completely removed the method and didn't have any problems.
Steve Rowe, could we completely remove that parseProperties method from Solr code and just let ZK handle it? I see that there's still some excedption handling code there after your change, but IMHO we should let ZK handle any problems or throw relevant exceptions.
The exception handling is for the case that there is a missing myid file, which I think is the ordinary case for embedded ZK. That's why I left it in.
I'll add some logging in the exception handling code and run a manual test to see if it gets invoked in that case.
I'm currently getting the following error when starting Solr in cloud mode (bin/solr start -c) in master. Could be related to this ticket, but not sure.
2017-05-01 02:36:29.297 ERROR (main) [ ] o.a.s.c.SolrCore null:java.lang.IllegalArgumentException: clientPort is not set
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:314)
at org.apache.solr.cloud.SolrZkServerProps.parseProperties(SolrZkServer.java:321)
at org.apache.solr.cloud.SolrZkServer.parseConfig(SolrZkServer.java:88)
at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:83)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:504)
at org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:245)
at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:169)
at org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:137)
at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:873)
at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:349)
at org.eclipse.jetty.webapp.WebAppContext.startWebapp(WebAppContext.java:1404)
at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1366)
at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:778)
at org.eclipse.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:262)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:520)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.deploy.bindings.StandardStarter.processBinding(StandardStarter.java:41)
at org.eclipse.jetty.deploy.AppLifeCycle.runBindings(AppLifeCycle.java:188)
at org.eclipse.jetty.deploy.DeploymentManager.requestAppGoal(DeploymentManager.java:499)
at org.eclipse.jetty.deploy.DeploymentManager.addApp(DeploymentManager.java:147)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider.fileAdded(ScanningAppProvider.java:180)
at org.eclipse.jetty.deploy.providers.WebAppProvider.fileAdded(WebAppProvider.java:458)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider$1.fileAdded(ScanningAppProvider.java:64)
at org.eclipse.jetty.util.Scanner.reportAddition(Scanner.java:610)
at org.eclipse.jetty.util.Scanner.reportDifferences(Scanner.java:529)
at org.eclipse.jetty.util.Scanner.scan(Scanner.java:392)
at org.eclipse.jetty.util.Scanner.doStart(Scanner.java:313)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.deploy.providers.ScanningAppProvider.doStart(ScanningAppProvider.java:150)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.deploy.DeploymentManager.startAppProvider(DeploymentManager.java:561)
at org.eclipse.jetty.deploy.DeploymentManager.doStart(DeploymentManager.java:236)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:131)
at org.eclipse.jetty.server.Server.start(Server.java:422)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:113)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:61)
at org.eclipse.jetty.server.Server.doStart(Server.java:389)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
at org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1516)
at java.security.AccessController.doPrivileged(Native Method)
at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1441)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
:
I'm currently getting the following error when starting Solr in cloud mode (bin/solr start -c) in master. Could be related to this ticket, but not sure.
2017-05-01 02:36:29.297 ERROR (main) [ ] o.a.s.c.SolrCore null:java.lang.IllegalArgumentException: clientPort is not set
Pretty sure this ^ is caused by the ZK upgrade. Reopening to address.
Prior to this issue, Solr did its own parsing of zoo.cfg, but after this issue, Solr delegates to ZK's QuorumPeerConfig.parseProperties() to do so. ZK's method (unlike Solr's previous parsing code) requires that clientPort be specified in zoo.cfg.
Solr's check for missing clientPort configuration (and setting to solrPort + 1000 if absent) is currently performed after parsing zoo.cfg. I have a patch I'm testing that moves supplying a default clientPort to before QuorumPeerConfig.parseProperties().
Patch moving default clientPort specification to before parsing of zoo.cfg.
All tests & precommit pass with SOLR-9386-fix-clientPort-parsing.patch, and bin/solr start -e cloud -noprompt works.
Committing shortly.
Commit c44d0bc89c03de2a3a69a1765d70b8aa0d81b475 in lucene-solr's branch refs/heads/branch_6x from steve_rowe
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c44d0bc ]
SOLR-9386: Move default clientPort specification to before calling QuorumPeerConfig.parseProperties(), which requires that clientPort be specified.
Commit 8c11f81a9505a0719e971ed6c54c9b6fc10bfa13 in lucene-solr's branch refs/heads/master from steve_rowe
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8c11f81 ]
SOLR-9386: Move default clientPort specification to before calling QuorumPeerConfig.parseProperties(), which requires that clientPort be specified.
steve_rowe, in newer ZK versions, standalone servers no longer require a myid file. We shouldn't need to handle that any more. I can't locate the ZOOKEEPER issue where that was changed, so I do not know what version made our standalone config work, but I'm pretty sure that 3.4.8 had it.
Steve Rowe, in newer ZK versions, standalone servers no longer require a myid file. We shouldn't need to handle that any more. I can't locate the ZOOKEEPER issue where that was changed, so I do not know what version made our standalone config work, but I'm pretty sure that 3.4.8 had it.
I looked at the QuorumPeerConfig.parseConfiguration() code, and as you say, the existence of a myid file is only checked for when the number of servers is greater than one.
However, there is other code that enables Solr's embedded ZK to participate in a quorum, so I think removing code that allows that would be a mistake. I haven't tested setting up a quorum exclusively with embedded ZK in multiple Solr nodes, but I assume that's possible.
I haven't tested setting up a quorum exclusively with embedded ZK
I am pretty sure that this is possible, by configuring the zoo_data and zoo.cfg just like you would for an external ensemble, including the myid file.
The machinery that we're talking about makes supplying the myid file unnecessary in the embedded ZK quorum case. Perhaps if you think we should no longer enable that, it should be dealt with in another issue? Seems orthogonal to me to upgrading ZK.
Perhaps if you think we should no longer enable that, it should be dealt with in another issue?
Sounds reasonable. Thanks for your effort here.
Patch based on the most recent patch on
SOLR-8724(see comments there for details), with two new changes:ZOOKEEPER-2383) - this will obviously need to be changed to 3.4.9 once it's released.ZOOKEEPER-2141removed a method from ZKDatabase and added a new one, so TestConfigSetAPIZkFailure.ForwardingZKDatabase needed to be modified to stop forwarding the former and start forwarding the latter.With the exception of tests that also fail on unpatched master (I'll put details in another JIRA), all Solr core/solrj/contrib non-nightly tests succeed with this patch.