Solr
  1. Solr
  2. SOLR-4725

Should we stop supporting "name" and "dataDir" in the autodiscover mode?

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Implemented
    • Affects Version/s: 4.3, 5.0
    • Fix Version/s: 4.3, 5.0
    • Component/s: None
    • Labels:
      None

      Description

      Making this a blocker so we resolve it. Should be quick to code if we have consensus, maybe nothing at all to do here.

      I'm not too happy with the fact that the new core discovery process has two real gotcha's. The individual core.properties file can define 'name' and 'dataDir'. It seems too easy to either use the same name for two different cores or use the same dataDir, just copy the core.properties file around and fail to edit one them. In large installations this could be a bear to track down.

      Straw-man proposal is the we remove support for them both in discovery mode. The name defaults to the directory in which core.properties is found and the data dir is immediately below there.

      Currently, there are checks to fail to load either core if either 'name' or 'dataDir' is defined in more than one core. I think the error reporting is weak, you probably have to look in the log file and there should be a way to get this in the admin UI at least.

      Maybe the right thing to do is just leave it as-is and emphasize that specifying the dataDir and name is expert level and you have to get it right, but I wanted to get wider exposure to the problem before we push 4.3 out.

      1. SOLR-4725.patch
        17 kB
        Erick Erickson

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.
          Hide
          Erick Erickson added a comment -

          Resolved by changes to SOLR-4662

          Show
          Erick Erickson added a comment - Resolved by changes to SOLR-4662
          Hide
          Erick Erickson added a comment -

          Currently, the logic (discovery mode) is that if the name property is NOT present, the name defaults to the directory that contains core.properties. If name is present, then it's used.

          None of this JIRA is about whether or not to have a name attribute, it's all about if the name parameter is specified in two different properties files and is the same in both. E.g. your setup is
          solrhome/core1/core.properties contains the attribute name=foo
          and
          solrhome/core2/core.properties contains the attribute name=foo

          What's the correct thing to do?

          I claim that this probably wasn't intended; there's no good way to dis-entangle this knot so we should fail to open either core, forcing the user to straighten this out b/c it certainly (IMO) is an error.

          I think the current behavior with the <core> tags in solr.xml is that last one wins, but which one is last depends on the vagaries of the XML parser, deterministic, but not necessarily easy to figure out.

          Not quite sure what the right thing to do in terms of persisting the name if it's not present originally, I can see the advantage of doing that though. Might even happen automagically, I'd have to look back at the persist code again.

          Nor would I be averse to making the name attribute mandatory, although I haven't thought it through thoroughly. JIRA's welcome.

          Show
          Erick Erickson added a comment - Currently, the logic (discovery mode) is that if the name property is NOT present, the name defaults to the directory that contains core.properties. If name is present, then it's used. None of this JIRA is about whether or not to have a name attribute, it's all about if the name parameter is specified in two different properties files and is the same in both. E.g. your setup is solrhome/core1/core.properties contains the attribute name=foo and solrhome/core2/core.properties contains the attribute name=foo What's the correct thing to do? I claim that this probably wasn't intended; there's no good way to dis-entangle this knot so we should fail to open either core, forcing the user to straighten this out b/c it certainly (IMO) is an error. I think the current behavior with the <core> tags in solr.xml is that last one wins, but which one is last depends on the vagaries of the XML parser, deterministic, but not necessarily easy to figure out. Not quite sure what the right thing to do in terms of persisting the name if it's not present originally, I can see the advantage of doing that though. Might even happen automagically, I'd have to look back at the persist code again. Nor would I be averse to making the name attribute mandatory, although I haven't thought it through thoroughly. JIRA's welcome.
          Hide
          Mark Miller added a comment -

          I think that a message should be logged saying that certain API actions like SWAP and RENAME will not be available

          -1, these should not become unavailable because of something like that.

          cores named on the API call in the 'core' and 'other' parameters.

          That is irrespective of this issue - the name is a required param for core admin calls.

          Show
          Mark Miller added a comment - I think that a message should be logged saying that certain API actions like SWAP and RENAME will not be available -1, these should not become unavailable because of something like that. cores named on the API call in the 'core' and 'other' parameters. That is irrespective of this issue - the name is a required param for core admin calls.
          Hide
          Shawn Heisey added a comment -

          If you have cores that don't have explicit name attributes, I think that a message should be logged saying that certain API actions like SWAP and RENAME will not be available. I don't know if it should be WARN or INFO. It would be OK if the message is only logged when the core is first loaded, not on RELOAD. Similarly, those actions should not be allowed when the name attribute is missing from cores named on the API call in the 'core' and 'other' parameters. If they are attempted an ERROR should be logged and returned in the http response.

          Or we could avoid that whole can of worms by making the name attribute mandatory, pretty much the exact opposite of this issue.

          The dataDir parameter is currently optional. There was a problem with RENAME on the mailing list today where a user had all their cores sharing the same instanceDir, but dataDir was missing on all of them. I will go ahead and file an issue for that. Is sharing an instanceDir possible with automatic core discovery?

          Show
          Shawn Heisey added a comment - If you have cores that don't have explicit name attributes, I think that a message should be logged saying that certain API actions like SWAP and RENAME will not be available. I don't know if it should be WARN or INFO. It would be OK if the message is only logged when the core is first loaded, not on RELOAD. Similarly, those actions should not be allowed when the name attribute is missing from cores named on the API call in the 'core' and 'other' parameters. If they are attempted an ERROR should be logged and returned in the http response. Or we could avoid that whole can of worms by making the name attribute mandatory, pretty much the exact opposite of this issue. The dataDir parameter is currently optional. There was a problem with RENAME on the mailing list today where a user had all their cores sharing the same instanceDir, but dataDir was missing on all of them. I will go ahead and file an issue for that. Is sharing an instanceDir possible with automatic core discovery?
          Hide
          Mark Miller added a comment -

          I focused so much on dataDir, I didn't even really consider name (also hadn't just been in the code as I have been now).

          I agree that support for name should remain in the properties - and that as it does at the moment, if it's absent, we just default it to the dir name. Perhaps we should populate the properties file with it as well, and not count on the dir name over time.

          Show
          Mark Miller added a comment - I focused so much on dataDir, I didn't even really consider name (also hadn't just been in the code as I have been now). I agree that support for name should remain in the properties - and that as it does at the moment, if it's absent, we just default it to the dir name. Perhaps we should populate the properties file with it as well, and not count on the dir name over time.
          Hide
          Shawn Heisey added a comment -

          If you get rid of the name attribute, how do you decide what the name of the core is? Is it the name of the directory that contains core.properties?

          What happens if you issue a SWAP or RENAME action and don't have a name attribute? The logical thing would be to rename the directory, but if you're on Windows, you can't do that as long as any process (not just Solr) has anything open anywhere in the entire directory tree.

          Show
          Shawn Heisey added a comment - If you get rid of the name attribute, how do you decide what the name of the core is? Is it the name of the directory that contains core.properties? What happens if you issue a SWAP or RENAME action and don't have a name attribute? The logical thing would be to rename the directory, but if you're on Windows, you can't do that as long as any process (not just Solr) has anything open anywhere in the entire directory tree.
          Hide
          Mark Miller added a comment -

          1> in coreContainer, do we really want to default persistence to true if it's not present? (about line 460)?

          No, I was more curious than anything and missed dropping the change - I think we do would want to do that, but we don't want to change that behavior for hte old style now. I'll remove that change.

          bq 2

          It's up to you - your argument makes sense to me.

          I'll remove 1 and commit and lets push on from there.

          Show
          Mark Miller added a comment - 1> in coreContainer, do we really want to default persistence to true if it's not present? (about line 460)? No, I was more curious than anything and missed dropping the change - I think we do would want to do that, but we don't want to change that behavior for hte old style now. I'll remove that change. bq 2 It's up to you - your argument makes sense to me. I'll remove 1 and commit and lets push on from there.
          Hide
          Erick Erickson added a comment -

          Well, I'll just close 4725 after this gets checked in, I duplicated what you did. You chose much better names for some of the methods, insureFail was a really stupid name...

          I do have two questions though:
          1> in coreContainer, do we really want to default persistence to true if it's not present? (about line 460)?
          2> ConfigSolr.addCore[475 or so], I was thinking of removing the failure case and just logging a warning for two cores having the same name on the theory that changing this behavior in old-style solr.xml (which this case is) might be too risky at this point. We can just let it die a natural death when we stop supporting the old-style solr.xml. I don't have strong feelings one way or the other though.

          OK, I have to leave in 1/2 hour. I'll check in first thing in the morning, let me know if there's anything I should be doing here or if you've checked it all in.

          Show
          Erick Erickson added a comment - Well, I'll just close 4725 after this gets checked in, I duplicated what you did. You chose much better names for some of the methods, insureFail was a really stupid name... I do have two questions though: 1> in coreContainer, do we really want to default persistence to true if it's not present? (about line 460)? 2> ConfigSolr.addCore [475 or so] , I was thinking of removing the failure case and just logging a warning for two cores having the same name on the theory that changing this behavior in old-style solr.xml (which this case is) might be too risky at this point. We can just let it die a natural death when we stop supporting the old-style solr.xml. I don't have strong feelings one way or the other though. OK, I have to leave in 1/2 hour. I'll check in first thing in the morning, let me know if there's anything I should be doing here or if you've checked it all in.
          Hide
          Erick Erickson added a comment -

          Preliminary patch to coordinate with SOLR-4662 changes Mark is doing

          Show
          Erick Erickson added a comment - Preliminary patch to coordinate with SOLR-4662 changes Mark is doing
          Hide
          Mark Miller added a comment -

          Neither solr.xml nor solrconfig.xml,

          That was solr.xml - got you - didn't realize that dataDir was an explicit option on top of the ability to do <property> - we don't want that. DataDir should only be specified in solrconfig.xml and overridable with sys prop substitution.

          Otherwise, we should let multiple cores point to the same data dir - perhaps someone is doing something all read only and only wants one. That's fine. The lock factory is there for to protect you from shooting yourself in the foot by accident.

          Show
          Mark Miller added a comment - Neither solr.xml nor solrconfig.xml, That was solr.xml - got you - didn't realize that dataDir was an explicit option on top of the ability to do <property> - we don't want that. DataDir should only be specified in solrconfig.xml and overridable with sys prop substitution. Otherwise, we should let multiple cores point to the same data dir - perhaps someone is doing something all read only and only wants one. That's fine. The lock factory is there for to protect you from shooting yourself in the foot by accident.
          Hide
          Erick Erickson added a comment -

          Neither solr.xml nor solrconfig.xml, I'm talking about an individual <core> attribute. In the old-style solr.xml, you could define these per core.

          <core name="myname" dataDir="absolute or relative path here" instanceDir="another path" blah blah />

          So in new-style core.properties file used for discovery it is pretty easy to have more than one core point to the same dataDir or more than one core have the same name just by copy/paste/edit failures. And since they're scattered in a number of different directories rather than concentrated in a single file, harder to find. And we didn't help them find the problem either, this condition was never checked for.

          Note that we've always allowed someone to shoot themselves in the foot this way, now I just think it's easier to do and harder to find in discovery mode. Actually SOLR-4662 refuses to load cores that violate either of these constraints.

          Show
          Erick Erickson added a comment - Neither solr.xml nor solrconfig.xml, I'm talking about an individual <core> attribute. In the old-style solr.xml, you could define these per core. <core name="myname" dataDir="absolute or relative path here" instanceDir="another path" blah blah /> So in new-style core.properties file used for discovery it is pretty easy to have more than one core point to the same dataDir or more than one core have the same name just by copy/paste/edit failures. And since they're scattered in a number of different directories rather than concentrated in a single file, harder to find. And we didn't help them find the problem either, this condition was never checked for. Note that we've always allowed someone to shoot themselves in the foot this way, now I just think it's easier to do and harder to find in discovery mode. Actually SOLR-4662 refuses to load cores that violate either of these constraints.
          Hide
          Mark Miller added a comment - - edited

          Okay, wait - you don't mean dataDir in solrconfig.xml.

          In that case, I'm confused...was their explicit support for dataDir in solr.xml before? I thought it was just basic property support so that if you setup solrconfig.xml to take a prop substitution, you could then set those properties in solr.xml with the generic <property setting. We don't need those anymore I think.

          Anyway, there doesn't need to be anything specific to the dataDir at the solr.xml level - that should be handled in solrconfig.xml.

          Show
          Mark Miller added a comment - - edited Okay, wait - you don't mean dataDir in solrconfig.xml. In that case, I'm confused...was their explicit support for dataDir in solr.xml before? I thought it was just basic property support so that if you setup solrconfig.xml to take a prop substitution, you could then set those properties in solr.xml with the generic <property setting. We don't need those anymore I think. Anyway, there doesn't need to be anything specific to the dataDir at the solr.xml level - that should be handled in solrconfig.xml.
          Hide
          Mark Miller added a comment -

          Currently, there are checks to fail to load either core if either 'name' or 'dataDir' is defined in more than one core.

          Why? You should be able to specify dataDir in more than one core.

          Show
          Mark Miller added a comment - Currently, there are checks to fail to load either core if either 'name' or 'dataDir' is defined in more than one core. Why? You should be able to specify dataDir in more than one core.
          Hide
          Mark Miller added a comment -

          I think dataDir should continue to be supported. There is not much reason to rip it.

          name should be removed.

          Show
          Mark Miller added a comment - I think dataDir should continue to be supported. There is not much reason to rip it. name should be removed.

            People

            • Assignee:
              Erick Erickson
              Reporter:
              Erick Erickson
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development