Solr
  1. Solr
  2. SOLR-4773

New discovery mode needs to ensure that instanceDir is correct

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.4, 5.0
    • Fix Version/s: 4.4, 5.0
    • Component/s: Schema and Analysis
    • Labels:
      None

      Description

      Doing a fresh checkout of 4.x (trunk to to I think) and firing up the example fails because we can't find solrconfig. The construction of the instanceDir in SolrCoreDiscoverer constructs a path with an extra solr (e.g. solr/solr/core).

      I'll attach a patch shortly.

      1. SOLR-4773.patch
        3 kB
        Erick Erickson
      2. SOLR-4773.patch
        2 kB
        Erick Erickson

        Activity

        Hide
        Uwe Schindler added a comment -

        Yeah the nightly smoker already failed! I hope this does not affect 4.3?

        Show
        Uwe Schindler added a comment - Yeah the nightly smoker already failed! I hope this does not affect 4.3?
        Hide
        Jack Krupansky added a comment -

        4.3 RC3 example runs fine for me.

        branch_4x ant run-example also works fine (although it has the problem with no longer supporting implicit collection name.)

        But ant package for branch_4x, unzip the release, run the example from the release gives me the reported problem.

        Show
        Jack Krupansky added a comment - 4.3 RC3 example runs fine for me. branch_4x ant run-example also works fine (although it has the problem with no longer supporting implicit collection name.) But ant package for branch_4x, unzip the release, run the example from the release gives me the reported problem.
        Hide
        Erick Erickson added a comment -

        Here's a preliminary patch that I really don't like much, but it illustrates the problem and has a test case. There's got to be a better way of removing the ancestor paths.

        Show
        Erick Erickson added a comment - Here's a preliminary patch that I really don't like much, but it illustrates the problem and has a test case. There's got to be a better way of removing the ancestor paths.
        Hide
        Erick Erickson added a comment -

        Uwe Schindler Don't think this affects 4.3, or if it does only people using the core discovery process. At least I switched to the example solr.xml file that uses core discovery from 4.x to 4.3, compiled and it was OK.

        Jack Krupansky blew up for me when I built 4x.

        Show
        Erick Erickson added a comment - Uwe Schindler Don't think this affects 4.3, or if it does only people using the core discovery process. At least I switched to the example solr.xml file that uses core discovery from 4.x to 4.3, compiled and it was OK. Jack Krupansky blew up for me when I built 4x.
        Hide
        Erick Erickson added a comment -

        I take it back, the new test in the patch does NOT fail without the code changes, probably because it's an absolute rather than relative path... But my plane is descending so I'll be chopped off pretty soon, I'll be looking later.

        Show
        Erick Erickson added a comment - I take it back, the new test in the patch does NOT fail without the code changes, probably because it's an absolute rather than relative path... But my plane is descending so I'll be chopped off pretty soon, I'll be looking later.
        Hide
        Erick Erickson added a comment -

        Different approach, putting up for comment before committing. Are there any good reasons not to use the absolute path for this (legacy) instanceDir stuff? This patch uses absolute path here....

        Show
        Erick Erickson added a comment - Different approach, putting up for comment before committing. Are there any good reasons not to use the absolute path for this (legacy) instanceDir stuff? This patch uses absolute path here....
        Hide
        Erick Erickson added a comment -

        Unless someone thinks this is a bad idea, I'll commit this later this evening (PST).

        Show
        Erick Erickson added a comment - Unless someone thinks this is a bad idea, I'll commit this later this evening (PST).
        Hide
        Jan Høydahl added a comment -

        +1
        Tested on branch_4x

        Show
        Jan Høydahl added a comment - +1 Tested on branch_4x
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] erick
        http://svn.apache.org/viewvc?view=revision&revision=1477621

        SOLR-4773, discovery mode needed a change to instanceDir

        Show
        Commit Tag Bot added a comment - [trunk commit] erick http://svn.apache.org/viewvc?view=revision&revision=1477621 SOLR-4773 , discovery mode needed a change to instanceDir
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] erick
        http://svn.apache.org/viewvc?view=revision&revision=1477639

        Fix for SOLR-4773, setting instanceDir in core discovery mode

        Show
        Commit Tag Bot added a comment - [branch_4x commit] erick http://svn.apache.org/viewvc?view=revision&revision=1477639 Fix for SOLR-4773 , setting instanceDir in core discovery mode
        Hide
        Erick Erickson added a comment -

        trunk - r: 1477621
        4.x - r: 1477639

        Show
        Erick Erickson added a comment - trunk - r: 1477621 4.x - r: 1477639
        Hide
        Mark Miller added a comment -

        It doesn't look like you committed the test?

        Show
        Mark Miller added a comment - It doesn't look like you committed the test?
        Hide
        Mark Miller added a comment -

        Reopening to track and make sure we get a test in that can catch this.

        Show
        Mark Miller added a comment - Reopening to track and make sure we get a test in that can catch this.
        Hide
        Erick Erickson added a comment -

        That'd be great. I tried to create a test, but I gave up fighting the test infrastructure. The problem I had trying to create a test is that I couldn't figure out how to create a test that didn't have absolute paths for the dir in question, and I wasn't able to create a failure with abs paths.....

        Looking forward to someone showing me how to make that happen.....

        Show
        Erick Erickson added a comment - That'd be great. I tried to create a test, but I gave up fighting the test infrastructure. The problem I had trying to create a test is that I couldn't figure out how to create a test that didn't have absolute paths for the dir in question, and I wasn't able to create a failure with abs paths..... Looking forward to someone showing me how to make that happen.....
        Hide
        Mark Miller added a comment -

        I'll tackle it (though probably not until the weekend).

        Show
        Mark Miller added a comment - I'll tackle it (though probably not until the weekend).
        Hide
        Andy Fowler added a comment -

        I'm thinking this is the cause of a bug I'm seeing in the 4.3.0 release. To reproduce using the multicore example:

        • echo "<solr></solr>" > multicore/solr.xml to put it into core discovery mode
        • place a core.properties file in core0/ and core1/ directories, just with loadOnStartup and transient properties defined.
        • start example `java -Dsolr.solr.home=multicore -jar start.jar`

        You should receive a "More than one core points to data dir 'multicore/data/'" failure on startup. Setting a relative path in each core.properties file doesn't work — it only works when I provide discrete dataDir for each core.

        Show
        Andy Fowler added a comment - I'm thinking this is the cause of a bug I'm seeing in the 4.3.0 release. To reproduce using the multicore example: echo "<solr></solr>" > multicore/solr.xml to put it into core discovery mode place a core.properties file in core0/ and core1/ directories, just with loadOnStartup and transient properties defined. start example `java -Dsolr.solr.home=multicore -jar start.jar` You should receive a "More than one core points to data dir 'multicore/data/'" failure on startup. Setting a relative path in each core.properties file doesn't work — it only works when I provide discrete dataDir for each core.
        Hide
        Mark Miller added a comment -

        You should receive a "More than one core points to data dir 'multicore/data/'" failure on startup. Setting a relative path in each core.properties file doesn't work — it only works when I provide discrete dataDir for each core.

        I've ripped all that data dir checking out for 4.4.

        Show
        Mark Miller added a comment - You should receive a "More than one core points to data dir 'multicore/data/'" failure on startup. Setting a relative path in each core.properties file doesn't work — it only works when I provide discrete dataDir for each core. I've ripped all that data dir checking out for 4.4.
        Hide
        Andy Fowler added a comment -

        Confirmed by compiling branch_4x that this fixes the bug I noticed in 4.3.0 release. To future travelers, this means that each core.properties file needs a discrete dataDir property in 4.3.0.

        Show
        Andy Fowler added a comment - Confirmed by compiling branch_4x that this fixes the bug I noticed in 4.3.0 release. To future travelers, this means that each core.properties file needs a discrete dataDir property in 4.3.0.
        Hide
        Erick Erickson added a comment -

        bq: I've ripped all that data dir checking out for 4.4.

        Does that include the checking for cores with the same name? Seems like that makes it easier for people to shoot themselves in the foot without giving them any clues what went wrong. And core discovery makes that pretty easy to do, just copy the core.properties file around and forget to change the name parameter.. Or an absolute path to the datadir....

        Show
        Erick Erickson added a comment - bq: I've ripped all that data dir checking out for 4.4. Does that include the checking for cores with the same name? Seems like that makes it easier for people to shoot themselves in the foot without giving them any clues what went wrong. And core discovery makes that pretty easy to do, just copy the core.properties file around and forget to change the name parameter.. Or an absolute path to the datadir....
        Hide
        Mark Miller added a comment -

        Does that include the checking for cores with the same name?

        Yes, all this checking and how it was done just further complicates the code and we want to get away from pre configuration as a way to create collections anyhow.

        We should just keep this simple - a core should fail to be created in the core container if there is an existing core with the same name, that's it.

        I feel all the transient and other recent changes to CoreContainer are really starting to significantly complicate what was already a design that needed some love, so I'm trying to simplify as much as possible so we can more easily refactor down the line.

        Show
        Mark Miller added a comment - Does that include the checking for cores with the same name? Yes, all this checking and how it was done just further complicates the code and we want to get away from pre configuration as a way to create collections anyhow. We should just keep this simple - a core should fail to be created in the core container if there is an existing core with the same name, that's it. I feel all the transient and other recent changes to CoreContainer are really starting to significantly complicate what was already a design that needed some love, so I'm trying to simplify as much as possible so we can more easily refactor down the line.
        Hide
        Erick Erickson added a comment -

        Makes sense, I wasn't altogether happy with the complexification. But we're leaving the user high and dry when tracking down errors.

        Take 4.x and just copy collection1 to collection2 and fire up solr. No warnings in the log. No errors in the log. But you can't get to collection2, you get a 404 error. And any index mods are done in the collection2 directory.

        Admittedly the configuration is foo'd and Solr is doing exactly what the defined behavior is (identically named cores last one wins). But how the hell is someone supposed to track that down? Especially with lots of cores? They don't get a single clue in the place we always say to look, the solr log.

        I see where there are tests for creating a core with the same name as an existing core via the core admin handler, but I don't see at a glance any coverage for this scenario.

        Show
        Erick Erickson added a comment - Makes sense, I wasn't altogether happy with the complexification. But we're leaving the user high and dry when tracking down errors. Take 4.x and just copy collection1 to collection2 and fire up solr. No warnings in the log. No errors in the log. But you can't get to collection2, you get a 404 error. And any index mods are done in the collection2 directory. Admittedly the configuration is foo'd and Solr is doing exactly what the defined behavior is (identically named cores last one wins). But how the hell is someone supposed to track that down? Especially with lots of cores? They don't get a single clue in the place we always say to look, the solr log. I see where there are tests for creating a core with the same name as an existing core via the core admin handler, but I don't see at a glance any coverage for this scenario.
        Hide
        Andy Fowler added a comment -

        Just to throw in my $0.02 as an app developer and solr consumer w/ far less knowledge on the rest of the worlds' use cases: if I accidentally put solr into a state where two cores were sharing a dataDir, I would really want some sort of strong warning, or just an absolute failure.

        I really like the way that cores are moving to being just a simple directory on the FS, rather than a block in a monolithic XML file. But if the cores are moving toward more backing by directory + properties file, it seems like accidentally sharing a dataDir could be a really bad thing.

        Show
        Andy Fowler added a comment - Just to throw in my $0.02 as an app developer and solr consumer w/ far less knowledge on the rest of the worlds' use cases: if I accidentally put solr into a state where two cores were sharing a dataDir, I would really want some sort of strong warning, or just an absolute failure. I really like the way that cores are moving to being just a simple directory on the FS, rather than a block in a monolithic XML file. But if the cores are moving toward more backing by directory + properties file, it seems like accidentally sharing a dataDir could be a really bad thing.
        Hide
        Mark Miller added a comment -

        You should get an error as I said - we just shouldn't be trying to detect it that way. Corecontainer should throw an exception when a core is added with an existing name.

        Show
        Mark Miller added a comment - You should get an error as I said - we just shouldn't be trying to detect it that way. Corecontainer should throw an exception when a core is added with an existing name.
        Hide
        Erick Erickson added a comment -

        New JIRA for same-named cores, see SOLR-4790.

        Show
        Erick Erickson added a comment - New JIRA for same-named cores, see SOLR-4790 .
        Hide
        Shawn Heisey added a comment -

        Andy Fowler - I finally found some time and looked into this more deeply in relation to 4.3.0. The patch for this issue does not fix the problem with relative dataDirs in the released version. The patch as-is won't apply to 4.3.0 because the code in branch_4x was significantly refactored, but I found the right place to apply the change from getPath to getCanonicalPath, and it didn't help. It did fix an exception during startup (solrconfig.xml couldn't be found) when I followed your simple instructions for running the multicore example with discovery, but only core0 started, core1 didn't, because it had the same dataDir as core0.

        Mark Miller - Do you happen to know how we can fix the problem with relative or missing dataDir properties in the 4.3 branch? Would the change be trivial enough to make it to 4.3.1? Discovery mode is essentially broken at the moment in the 4.3.0 release unless you have full absolute paths that are explicitly declared in the properties file. This is not how I want things to work in my own setup.

        Show
        Shawn Heisey added a comment - Andy Fowler - I finally found some time and looked into this more deeply in relation to 4.3.0. The patch for this issue does not fix the problem with relative dataDirs in the released version. The patch as-is won't apply to 4.3.0 because the code in branch_4x was significantly refactored, but I found the right place to apply the change from getPath to getCanonicalPath, and it didn't help. It did fix an exception during startup (solrconfig.xml couldn't be found) when I followed your simple instructions for running the multicore example with discovery, but only core0 started, core1 didn't, because it had the same dataDir as core0. Mark Miller - Do you happen to know how we can fix the problem with relative or missing dataDir properties in the 4.3 branch? Would the change be trivial enough to make it to 4.3.1? Discovery mode is essentially broken at the moment in the 4.3.0 release unless you have full absolute paths that are explicitly declared in the properties file. This is not how I want things to work in my own setup.
        Hide
        Mark Miller added a comment -

        I don't know that it would be that easy - I honestly think it could involve multiple bugs - I fixed many little things in this area in 4x and I think erick has as well. With all the refactoring and all the bugs (there is still a critical outstanding one around persistence if I remember right), as I've said elsewhere, I don't consider the new solr.xml format to be usable for 4.3 and I don't think I can confidently make it solid for 4.3.1 - I've dedicated what time I can to make it solid for 4.4 (and I know there is still work to do and work I've done but not put up yet), which shouldn't be too far out.

        I reccomended keeping this stuff in 5x and changing the tests and example to use it before pushing it out in a release - as I said, it didn't actually bake in 5x by sitting there, essentially hidden from dev use and wide testing for a couple weeks - I pushed back a bit multiple times - because I was worried this would be the result, but at this point, we are where we are, and I can only offer to help make things usable for 4.4 at best.

        Show
        Mark Miller added a comment - I don't know that it would be that easy - I honestly think it could involve multiple bugs - I fixed many little things in this area in 4x and I think erick has as well. With all the refactoring and all the bugs (there is still a critical outstanding one around persistence if I remember right), as I've said elsewhere, I don't consider the new solr.xml format to be usable for 4.3 and I don't think I can confidently make it solid for 4.3.1 - I've dedicated what time I can to make it solid for 4.4 (and I know there is still work to do and work I've done but not put up yet), which shouldn't be too far out. I reccomended keeping this stuff in 5x and changing the tests and example to use it before pushing it out in a release - as I said, it didn't actually bake in 5x by sitting there, essentially hidden from dev use and wide testing for a couple weeks - I pushed back a bit multiple times - because I was worried this would be the result, but at this point, we are where we are, and I can only offer to help make things usable for 4.4 at best.
        Hide
        Shawn Heisey added a comment -

        at this point, we are where we are, and I can only offer to help make things usable for 4.4 at best.

        Thanks, Mark. I thought that might be the case, just wanted to nail it down. Now I know what to tell users!

        Show
        Shawn Heisey added a comment - at this point, we are where we are, and I can only offer to help make things usable for 4.4 at best. Thanks, Mark. I thought that might be the case, just wanted to nail it down. Now I know what to tell users!
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Hide
        Erick Erickson added a comment -

        This was fixed a long time ago by SOLR-4790, closing it slipped through the cracks somehow.

        Show
        Erick Erickson added a comment - This was fixed a long time ago by SOLR-4790 , closing it slipped through the cracks somehow.

          People

          • Assignee:
            Mark Miller
            Reporter:
            Erick Erickson
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development