SOLR-11122 introduced a bug starting with 6.6.1 that means if you create a collection with legacyCloud=true then switch to legacyCloud=false, you get an NPE because coreNodeName is not defined in core.properties.
Since the default for legacyCloud changed from true to false between 6.6.1 and 7.x, this means that any attempt to upgrade Solr with existing collections created with Solr 6.6.1 or 6.6.2 will fail if the default value for legacyCloud is used in both. Collections created with 6.6.0 would work. Collections created in 6.6.1 or 6.6.2 with legacyCloud=false will work.
This is not as egregious with any collections created with 7.0 since if the default legacyCloud=false is present when the core is created, properties are persisted with coreNodeName. However, if someone switches legacyCloud to true, then creates a collection, then changes legacyCloud back to false then they'll hit this even in 7.0+
This happened because bit of reordering switched the order of the calls below. coreNodeName is added to the descriptor in create/createFromDescriptor(this, cd) via zkContgroller.preRegister so coresLocator.create(this, cd) persists core.properties without coreNodeName.
SolrCore core = createFromDescriptor(cd, true, newCollection);
(NOTE: private calls to create were renamed to createFromDescriptor in
I've got a fix in the works for creating cores, I'll attach a preliminary patch w/o tests in a bit for discussion, but the question is really what to do about 6.6.1 and 6.6.2 and 7.1 for that matter.
This is compounded by the fact that with the CVE, there's strong incentive to move to 6.6.2. siiiigh.
There are two parts to fixing this completely:
1> create core.properties correctly
2> deal with coreNodeName not being in the core.properties file by going to ZK and getting it (and persisting it). Haven't worked that part out yet though, not in the first patch. Note one point here if it works as I hope it will update the core.properties files first time they're opened.
Options that I see, there are really two parts:
part1 create the core.properties correctly
> Release 6.6.3, and/or 7.1.1 with this fix. This still leaves 7.0 a problem.
> Recommend people not install 7x over collections created with 6x until they have a version with fixes (7.1.1? 7.2?). Switching legacyCloud values and creating collections is at your own risk.
> Recommend that people change legacyCloud=true in 7.x until they start working with a fixed version, which one TBD.
part2 deal with coreNodeName not being in the core.properties
> Not backport and release with 7.2? set legacyCloud=true until then.
> Backport to point releases like 7.1.1? 6.6.3?
> and what about 7.0? I don't think many people will be affected by 7.0 since 7.1 came out so soon after. And setting legacyCloud=true will let people get by.
Fixing the two parts is not a question, they both need to be fixed. The real question is whether we need to create a point release that incorporates one or both or whether saying "you must set legacyCloud=true prior to Solr version 7.# in order to work with any collections created with Solr versions 6.6.1 through 7.#".
Let's hear opinions......