Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-6671

Introduce a solr.data.home as root dir for all data

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.10.1
    • Fix Version/s: 7.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      Many users prefer to deploy code, config and data on separate disk locations, so the default of placing the indexes under ${solr.solr.home}/${solr.core.name}/data is not always wanted.

      In a multi-core/collection system, there is not much help in the solr.data.dir option, as it would set the dataDir to the same folder for all collections. One workaround, if you don't want to hardcode paths in your solrconfig.xml, is to specify the dataDir property in each solr.properties file.

      A more elegant solution would be to introduce a new Java-option solr.data.home which would be to data the same as solr.solr.home is for config. If set, all collections would default their dataDir as ${solr.data.home)/${solr.core.name}/data

      1. SOLR-6671.patch
        16 kB
        Jan Høydahl
      2. SOLR-6671.patch
        24 kB
        Jan Høydahl
      3. SOLR-6671.patch
        13 kB
        Jan Høydahl
      4. SOLR-6671.patch
        8 kB
        Jan Høydahl
      5. SOLR-6671.patch
        8 kB
        Jan Høydahl
      6. SOLR-6671.patch
        5 kB
        Jan Høydahl
      7. SOLR-6671.patch
        5 kB
        Jan Høydahl
      8. SOLR-6671.patch
        4 kB
        Jan Høydahl

        Issue Links

          Activity

          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b5b759ca9099ce6e0a7eb8502576bf3e35f77ab2 in lucene-solr's branch refs/heads/branch_7_0 from Jan Høydahl
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b5b759c ]

          SOLR-6671: Move changes entry to 7.0.0

          (cherry picked from commit aa7394d)

          Show
          jira-bot ASF subversion and git services added a comment - Commit b5b759ca9099ce6e0a7eb8502576bf3e35f77ab2 in lucene-solr's branch refs/heads/branch_7_0 from Jan Høydahl [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b5b759c ] SOLR-6671 : Move changes entry to 7.0.0 (cherry picked from commit aa7394d)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit cc2114e77cb8da5143a6093c71c638fea0917842 in lucene-solr's branch refs/heads/branch_7x from Jan Høydahl
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cc2114e ]

          SOLR-6671: Move changes entry to 7.0.0

          (cherry picked from commit aa7394d)

          Show
          jira-bot ASF subversion and git services added a comment - Commit cc2114e77cb8da5143a6093c71c638fea0917842 in lucene-solr's branch refs/heads/branch_7x from Jan Høydahl [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cc2114e ] SOLR-6671 : Move changes entry to 7.0.0 (cherry picked from commit aa7394d)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit aa7394d27fa575a31c0ddee82ce3bc0bb7205a98 in lucene-solr's branch refs/heads/master from Jan Høydahl
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=aa7394d ]

          SOLR-6671: Move changes entry to 7.0.0

          Show
          jira-bot ASF subversion and git services added a comment - Commit aa7394d27fa575a31c0ddee82ce3bc0bb7205a98 in lucene-solr's branch refs/heads/master from Jan Høydahl [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=aa7394d ] SOLR-6671 : Move changes entry to 7.0.0
          Hide
          janhoy Jan Høydahl added a comment -

          I noticed that the CHANGES entry for this issue is located in the 6.5.0 section, while it should be in 7.0.0. I'll commit a fix to master, branch_7x and branch_7_0 (in case of re-spin), ok?

          Show
          janhoy Jan Høydahl added a comment - I noticed that the CHANGES entry for this issue is located in the 6.5.0 section, while it should be in 7.0.0. I'll commit a fix to master, branch_7x and branch_7_0 (in case of re-spin), ok?
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Also see SOLR-11038

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Also see SOLR-11038
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          I created SOLR-11036 and SOLR-11037

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - I created SOLR-11036 and SOLR-11037
          Hide
          janhoy Jan Høydahl added a comment -

          Shalin, can u create a new jira for the metrics and one for getDataHome?

          Show
          janhoy Jan Høydahl added a comment - Shalin, can u create a new jira for the metrics and one for getDataHome?
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          The totalSpace and usableSpace reported by Metrics API are still based on coreRootDirectory (which is used as the instance dir for individual cores). So we should expose the data home's disk metrics as well. We can introduce new metrics such as dataHomeTotalSpace and dataHomeUsableSpace? Also, it'd be great if there was a method one could call to get the data home without needing a CoreDescriptor first.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - The totalSpace and usableSpace reported by Metrics API are still based on coreRootDirectory (which is used as the instance dir for individual cores). So we should expose the data home's disk metrics as well. We can introduce new metrics such as dataHomeTotalSpace and dataHomeUsableSpace? Also, it'd be great if there was a method one could call to get the data home without needing a CoreDescriptor first.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 20dcb56da85accabd8e32b41afaca71707797ade in lucene-solr's branch refs/heads/master from Uwe Schindler
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=20dcb56 ]

          SOLR-6671: More generic fix to assert Solr's dataHome

          Show
          jira-bot ASF subversion and git services added a comment - Commit 20dcb56da85accabd8e32b41afaca71707797ade in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=20dcb56 ] SOLR-6671 : More generic fix to assert Solr's dataHome
          Hide
          janhoy Jan Høydahl added a comment -

          +1 That is more generic yes

          Show
          janhoy Jan Høydahl added a comment - +1 That is more generic yes
          Hide
          thetaphi Uwe Schindler added a comment -

          I'd do it that way:

          private void assertDataHome(String expected, String instanceDir, RAMDirectoryFactory rdf, MockCoreContainer cc, String... properties) throws IOException {
            String dataHome = rdf.getDataHome(new CoreDescriptor("core_name", Paths.get(instanceDir), cc.containerProperties, cc.isZooKeeperAware(), properties));
            assertEquals(Paths.get(expected).toAbsolutePath(), Paths.get(dataHome).toAbsolutePath());
          }
          

          I just tested and passes also on windows! I can commit that if you like!

          Show
          thetaphi Uwe Schindler added a comment - I'd do it that way: private void assertDataHome( String expected, String instanceDir, RAMDirectoryFactory rdf, MockCoreContainer cc, String ... properties) throws IOException { String dataHome = rdf.getDataHome( new CoreDescriptor( "core_name" , Paths.get(instanceDir), cc.containerProperties, cc.isZooKeeperAware(), properties)); assertEquals(Paths.get(expected).toAbsolutePath(), Paths.get(dataHome).toAbsolutePath()); } I just tested and passes also on windows! I can commit that if you like!
          Hide
          thetaphi Uwe Schindler added a comment -

          Hi Jan,

          I think the fixes as committed are fine.

          In addition I changed your regex to allow drive names other than "C:", because on Windows the drive is inherited. If you have your checkout on D: or X:, it would return another path, so the code above won't work!

          IMHO: The correct fix would be to just resolve the path name completely (get absolute name) and then compare the Path objects. String comparisons of folder names are always likely to break. With Path objects everything is right: slashes and drives are there. I did not fix that because I just wanted to correct it and not change semantics of your patch. I added the TODO, to think about how to do it better!

          Show
          thetaphi Uwe Schindler added a comment - Hi Jan, I think the fixes as committed are fine. In addition I changed your regex to allow drive names other than "C:", because on Windows the drive is inherited. If you have your checkout on D: or X:, it would return another path, so the code above won't work! IMHO: The correct fix would be to just resolve the path name completely (get absolute name) and then compare the Path objects. String comparisons of folder names are always likely to break. With Path objects everything is right: slashes and drives are there. I did not fix that because I just wanted to correct it and not change semantics of your patch. I added the TODO, to think about how to do it better!
          Hide
          janhoy Jan Høydahl added a comment -

          Thanks Uwe, I committed the fix quickly without checking precommit.
          Your edits are fine, I was not aware of Constants.WINDOWS. I thought of making the assert more generic and move it to some test Util class, but this was the bare minimum of what was needed. Perhaps it would be more readable to have separate assertDataHome calls for Win and Linux:

          if (Constants.WINDOWS) {
            assertDataHome("C:\\solrdata\\inst_dir\\data", "inst_dir", rdf, cc);
          } else {
            assertDataHome("/solrdata/inst_dir/data", "inst_dir", rdf, cc);
          }
          
          Show
          janhoy Jan Høydahl added a comment - Thanks Uwe, I committed the fix quickly without checking precommit. Your edits are fine, I was not aware of Constants.WINDOWS . I thought of making the assert more generic and move it to some test Util class, but this was the bare minimum of what was needed. Perhaps it would be more readable to have separate assertDataHome calls for Win and Linux: if (Constants.WINDOWS) { assertDataHome( "C:\\solrdata\\inst_dir\\data" , "inst_dir" , rdf, cc); } else { assertDataHome( "/solrdata/inst_dir/data" , "inst_dir" , rdf, cc); }
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 7b322bd67e5a3a9c7f9ecf165d89da60c3767fbd in lucene-solr's branch refs/heads/master from Uwe Schindler
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7b322bd ]

          SOLR-6671: Fix precommit and use the Lucene-Constant to detect Windows. Also allow other local drives!

          Show
          jira-bot ASF subversion and git services added a comment - Commit 7b322bd67e5a3a9c7f9ecf165d89da60c3767fbd in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7b322bd ] SOLR-6671 : Fix precommit and use the Lucene-Constant to detect Windows. Also allow other local drives!
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 8000b25cabef69bc31e64dee2c3ef619b77f84f7 in lucene-solr's branch refs/heads/master from Jan Høydahl
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8000b25 ]

          SOLR-6671: Fix tests on Windows

          Show
          jira-bot ASF subversion and git services added a comment - Commit 8000b25cabef69bc31e64dee2c3ef619b77f84f7 in lucene-solr's branch refs/heads/master from Jan Høydahl [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8000b25 ] SOLR-6671 : Fix tests on Windows
          Hide
          janhoy Jan Høydahl added a comment -

          Thanks Hoss. Looking.

          Show
          janhoy Jan Høydahl added a comment - Thanks Hoss. Looking.
          Hide
          hossman Hoss Man added a comment - - edited

          Jan: AFAICT DirectoryFactoryTest.testGetDataHome has failed on every windows policeman jenkins build since you added this test?

          FAILED:  org.apache.solr.core.DirectoryFactoryTest.testGetDataHome
          
          Error Message:
          expected:<[/tmp/inst1/]data> but was:<[C:\tmp\inst1\]data>
          
          Stack Trace:
          org.junit.ComparisonFailure: expected:<[/tmp/inst1/]data> but was:<[C:\tmp\inst1\]data>
                  at __randomizedtesting.SeedInfo.seed([724D123FB68E10C5:EE2BCEE4EE061786]:0)
                  at org.junit.Assert.assertEquals(Assert.java:125)
                  at org.junit.Assert.assertEquals(Assert.java:147)
                  at org.apache.solr.core.DirectoryFactoryTest.testGetDataHome(DirectoryFactoryTest.java:58)
                  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
                  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          
          
          Show
          hossman Hoss Man added a comment - - edited Jan: AFAICT DirectoryFactoryTest.testGetDataHome has failed on every windows policeman jenkins build since you added this test? FAILED: org.apache.solr.core.DirectoryFactoryTest.testGetDataHome Error Message: expected:<[/tmp/inst1/]data> but was:<[C:\tmp\inst1\]data> Stack Trace: org.junit.ComparisonFailure: expected:<[/tmp/inst1/]data> but was:<[C:\tmp\inst1\]data> at __randomizedtesting.SeedInfo.seed([724D123FB68E10C5:EE2BCEE4EE061786]:0) at org.junit.Assert.assertEquals(Assert.java:125) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.solr.core.DirectoryFactoryTest.testGetDataHome(DirectoryFactoryTest.java:58) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 39dfb7808ac11c369985549dff06441f0cf5b93c in lucene-solr's branch refs/heads/master from Jan Høydahl
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=39dfb78 ]

          SOLR-6671: Possible to set solr.data.home property as root dir for all data

          Show
          jira-bot ASF subversion and git services added a comment - Commit 39dfb7808ac11c369985549dff06441f0cf5b93c in lucene-solr's branch refs/heads/master from Jan Høydahl [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=39dfb78 ] SOLR-6671 : Possible to set solr.data.home property as root dir for all data
          Hide
          janhoy Jan Høydahl added a comment -

          Ok, I tested on windows, found a few bugs and fixed those. Verified setting both through -t and SOLR_DATA_HOME on both macOS and Windows10. Committing.

          Show
          janhoy Jan Høydahl added a comment - Ok, I tested on windows, found a few bugs and fixed those. Verified setting both through -t and SOLR_DATA_HOME on both macOS and Windows10. Committing.
          Hide
          janhoy Jan Høydahl added a comment -

          Thanks Mark

          Show
          janhoy Jan Høydahl added a comment - Thanks Mark
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I've only read through the changes, but looks okay to me.

          Show
          markrmiller@gmail.com Mark Miller added a comment - I've only read through the changes, but looks okay to me.
          Hide
          janhoy Jan Høydahl added a comment -

          Getting ready to commit this, hoping to target 7.0. Would really like to get a few more eyes on the solution.
          PS: It may be easier to review the patch on GitHub: https://github.com/apache/lucene-solr/compare/master...cominvent:solr6671?expand=1

          Show
          janhoy Jan Høydahl added a comment - Getting ready to commit this, hoping to target 7.0. Would really like to get a few more eyes on the solution. PS: It may be easier to review the patch on GitHub: https://github.com/apache/lucene-solr/compare/master...cominvent:solr6671?expand=1
          Hide
          janhoy Jan Høydahl added a comment - - edited

          New patch, the previous did not include all bin/solr changes.
          Not tackling install_solr_service.sh in this issue. UPDATE: See SOLR-10906
          Applies to current master

          Show
          janhoy Jan Høydahl added a comment - - edited New patch, the previous did not include all bin/solr changes. Not tackling install_solr_service.sh in this issue. UPDATE: See SOLR-10906 Applies to current master
          Hide
          janhoy Jan Høydahl added a comment -

          New patch

          • Patch applies on current master
          • Added var SOLR_DATA_HOME to solr.in.xx and convert to solr.data.home in bin/solr(.cmd) - this is not tested
          • Asciidoc documentation

          Remains:

          • Add support in install_solr-service.sh
          Show
          janhoy Jan Høydahl added a comment - New patch Patch applies on current master Added var SOLR_DATA_HOME to solr.in.xx and convert to solr.data.home in bin/solr(.cmd) - this is not tested Asciidoc documentation Remains: Add support in install_solr-service.sh
          Hide
          janhoy Jan Høydahl added a comment -

          I see many customer examples where there is a wish for separating data home from config home, so I'd like to push this forward again.

          The default for SOLR_DATA_HOME could still be same as SOLR_HOME, but in the linux installer script, we could default to using /var/solr/data for data and /var/solr/home for home, so solr.in.sh would typically look like:

          SOLR_PID_DIR="/var/solr"
          SOLR_HOME="/var/solr/home"
          SOLR_DATA_HOME="/var/solr/data"
          LOG4J_PROPS="/var/solr/log4j.properties"
          SOLR_LOGS_DIR="/var/solr/logs"
          SOLR_PORT="8983"
          

          and produce this tree:

          /var/solr
          ├── data
          │   ├── bar
          │   │   └── data
          │   │       ├── index
          │   │       └── tlog
          │   └── foo
          │       └── data
          │           ├── index
          │           └── tlog
          ├── home
          │   ├── bar
          │   │   ├── conf
          │   │   │   ├── managed-schema
          │   │   │   └── solrconfig.xml
          │   │   └── core.properties
          │   ├── foo
          │   │   ├── conf
          │   │   │   ├── managed-schema
          │   │   │   └── solrconfig.xml
          │   │   └── core.properties
          │   ├── solr.xml
          │   └── zoo.cfg
          ├── log4j.properties
          └── logs
              └── solr.log.1
          

          Benefit is that it is super easy move data to a new partition/disk with a single mv command. We just now have a customer who upgrade from 4.x to 6.x using Linux installer, but still want to run non-cloud. They need to separate data from config, i.e. they are not happy to have configs in /var/solr/data together with data, it makes upgrading only the config harder. Today they solve it by hardcoding <dir> in every single solrconfig.xml. In the new install I have used symlinks for each conf folder instead, so they can have a partition where they replace the home/<core>/conf folders from SCM without disturbing data.

          This would also help solve SOLR-10095.

          Show
          janhoy Jan Høydahl added a comment - I see many customer examples where there is a wish for separating data home from config home, so I'd like to push this forward again. The default for SOLR_DATA_HOME could still be same as SOLR_HOME, but in the linux installer script, we could default to using /var/solr/data for data and /var/solr/home for home, so solr.in.sh would typically look like: SOLR_PID_DIR="/var/solr" SOLR_HOME="/var/solr/home" SOLR_DATA_HOME="/var/solr/data" LOG4J_PROPS="/var/solr/log4j.properties" SOLR_LOGS_DIR="/var/solr/logs" SOLR_PORT="8983" and produce this tree: /var/solr ├── data │   ├── bar │   │   └── data │   │   ├── index │   │   └── tlog │   └── foo │   └── data │   ├── index │   └── tlog ├── home │   ├── bar │   │   ├── conf │   │   │   ├── managed-schema │   │   │   └── solrconfig.xml │   │   └── core.properties │   ├── foo │   │   ├── conf │   │   │   ├── managed-schema │   │   │   └── solrconfig.xml │   │   └── core.properties │   ├── solr.xml │   └── zoo.cfg ├── log4j.properties └── logs └── solr.log.1 Benefit is that it is super easy move data to a new partition/disk with a single mv command. We just now have a customer who upgrade from 4.x to 6.x using Linux installer, but still want to run non-cloud. They need to separate data from config, i.e. they are not happy to have configs in /var/solr/data together with data, it makes upgrading only the config harder. Today they solve it by hardcoding <dir> in every single solrconfig.xml. In the new install I have used symlinks for each conf folder instead, so they can have a partition where they replace the home/<core>/conf folders from SCM without disturbing data. This would also help solve SOLR-10095 .
          Hide
          janhoy Jan Høydahl added a comment - - edited

          New patch

          • Explicit test for verifying last path component of instanceDir is used

          Shawn Heisey thinking a bit more I agree it makes sense to use last component of instanceDir as the folder name inside solr.data.home. Then the directory structure is exactly the same below $SOLR_HOME/ and $SOLR_DATA_HOME/. So a core rename will not cause Solr to look for the index in another location, and a core swap will load the index from the other instanceDir, even if name is same after swap.

          In SolrCloud rename and swap are not supported, so I assume that in most cases instanceDir==core_name so it will be compatible.

          Appreciate another pair of eyes, and tips for how to add more tests.

          Questions:

          • I am tempted to remove the support for setting solr.data.home as a property to the <directoryFactory> element in solrconfig.xml. It is of limited use to set such a root folder for each and every core, especially when you already can set <dataDir> on a per core level. Any objections?
          • I chose to put the variable Path dataHomePath in DirectoryFactory.java where it is used in getDataHome(). But I initialize the variable in CachingDirectoryFactory#init since I view this class as the parent class of all file-system-based factories. The only exception I know is HdfsDirectoryFactory which also subclasses CachingDirectoryFactory, but it overrides both init and getDataHome. Comments?
          • Anyone aware of code that calculates/assumes dataDir of a core without calling DirectoryFactory#getDataHome? We could detect such though adding solr.data.home to random test params but no need if we don't fear it to be a problem?

          This patch will also obviously trigger many changes in documentation which currently assumes dataDir always being relative to solr.solr.home or set per core by <dataDir>. Appreciate help in updating the patch with such documentation changes

          Show
          janhoy Jan Høydahl added a comment - - edited New patch Explicit test for verifying last path component of instanceDir is used Shawn Heisey thinking a bit more I agree it makes sense to use last component of instanceDir as the folder name inside solr.data.home . Then the directory structure is exactly the same below $SOLR_HOME/ and $SOLR_DATA_HOME/ . So a core rename will not cause Solr to look for the index in another location, and a core swap will load the index from the other instanceDir, even if name is same after swap. In SolrCloud rename and swap are not supported, so I assume that in most cases instanceDir==core_name so it will be compatible. Appreciate another pair of eyes, and tips for how to add more tests. Questions: I am tempted to remove the support for setting solr.data.home as a property to the <directoryFactory> element in solrconfig.xml . It is of limited use to set such a root folder for each and every core, especially when you already can set <dataDir> on a per core level. Any objections? I chose to put the variable Path dataHomePath in DirectoryFactory.java where it is used in getDataHome(). But I initialize the variable in CachingDirectoryFactory#init since I view this class as the parent class of all file-system-based factories. The only exception I know is HdfsDirectoryFactory which also subclasses CachingDirectoryFactory , but it overrides both init and getDataHome . Comments? Anyone aware of code that calculates/assumes dataDir of a core without calling DirectoryFactory#getDataHome ? We could detect such though adding solr.data.home to random test params but no need if we don't fear it to be a problem? This patch will also obviously trigger many changes in documentation which currently assumes dataDir always being relative to solr.solr.home or set per core by <dataDir> . Appreciate help in updating the patch with such documentation changes
          Hide
          janhoy Jan Høydahl added a comment -

          New patch

          • Test of the getDataHome() method
          • Uses last path element of instanceDir instead of coreName.

          I'm not 100% sure of the last change, using last path element of instanceDir instead of core name. These will be equal if instanceDir is not overridden, but if changed, the burden will be on the user to choose instanceDir with non-overlapping last path components.

          Show
          janhoy Jan Høydahl added a comment - New patch Test of the getDataHome() method Uses last path element of instanceDir instead of coreName. I'm not 100% sure of the last change, using last path element of instanceDir instead of core name. These will be equal if instanceDir is not overridden, but if changed, the burden will be on the user to choose instanceDir with non-overlapping last path components.
          Hide
          janhoy Jan Høydahl added a comment -

          New patch that supports absolute dataDir

          Show
          janhoy Jan Høydahl added a comment - New patch that supports absolute dataDir
          Hide
          janhoy Jan Høydahl added a comment -

          What determines the name of the dataDir? Is it the core name, or is the last path component of the instanceDir?

          Here's the code:

          dataDir = Paths.get(cd.getCoreContainer().getSolrHome()).resolve(dataHomePath).resolve(cd.getName() + File.separator + cd.getDataDir()).toAbsolutePath().toString();
          

          So you will typically see /path/to/solr.data.dir/collname_shard1_replica1/data/, but in your example, it would probably be /path/to/solr.data.dir/s0live/s0_1/. I had to include core name as one level so we don't end up with all "data" folders in the same root dir. Guess I see a bug right there - we need to support absolute path dataDir which would override the solr.data.home.

          Show
          janhoy Jan Høydahl added a comment - What determines the name of the dataDir? Is it the core name, or is the last path component of the instanceDir? Here's the code: dataDir = Paths.get(cd.getCoreContainer().getSolrHome()).resolve(dataHomePath).resolve(cd.getName() + File.separator + cd.getDataDir()).toAbsolutePath().toString(); So you will typically see /path/to/solr.data.dir/collname_shard1_replica1/data/ , but in your example, it would probably be /path/to/solr.data.dir/s0live/s0_1/ . I had to include core name as one level so we don't end up with all "data" folders in the same root dir. Guess I see a bug right there - we need to support absolute path dataDir which would override the solr.data.home .
          Hide
          janhoy Jan Høydahl added a comment -

          Patch which applies to master. Using Path instead of File and some other changes. Still no tests, but you can test it like this

          bin/solr start -a "-Dsolr.data.home=/tmp/mydata"
          bin/solr create -c foo
          ls -l server/solr/foo     # You only find conf and core.properties
          ls -l /tmp/mydata/foo     # You only find data
          
          Show
          janhoy Jan Høydahl added a comment - Patch which applies to master. Using Path instead of File and some other changes. Still no tests, but you can test it like this bin/solr start -a "-Dsolr.data.home=/tmp/mydata" bin/solr create -c foo ls -l server/solr/foo # You only find conf and core.properties ls -l /tmp/mydata/foo # You only find data
          Hide
          elyograg Shawn Heisey added a comment -

          Didn't mean to change the issue assignment. Must have been an errant click.

          Show
          elyograg Shawn Heisey added a comment - Didn't mean to change the issue assignment. Must have been an errant click.
          Hide
          elyograg Shawn Heisey added a comment -

          I really like this idea. It probably wouldn't work well for me without some tweaking, but for a typical Solr user, it could be a lifesaver.

          I use core.properties files like the following, so the instanceDir and dataDir names do not include the "live" and "build" parts of the core names, which get swapped when I do a full rebuild:

          #Written by CorePropertiesLocator
          #Mon Feb 01 23:57:50 UTC 2016
          name=s0live
          loadonStartup=false
          dataDir=../../data/s0_1
          transient=false
          
          #Written by CorePropertiesLocator
          #Mon Feb 01 23:57:50 UTC 2016
          name=s0build
          loadonStartup=false
          dataDir=../../data/s0_0
          transient=false
          

          What determines the name of the dataDir? Is it the core name, or is the last path component of the instanceDir? If it's the core name, that won't work for me – I would need it to be the last path component on instanceDir.

          If the feature were set up so that I could have a core.properties file like this that would do the right thing, it would be really nice:

          name=s0live
          dataDir=s0_1
          
          Show
          elyograg Shawn Heisey added a comment - I really like this idea. It probably wouldn't work well for me without some tweaking, but for a typical Solr user, it could be a lifesaver. I use core.properties files like the following, so the instanceDir and dataDir names do not include the "live" and "build" parts of the core names, which get swapped when I do a full rebuild: #Written by CorePropertiesLocator #Mon Feb 01 23:57:50 UTC 2016 name=s0live loadonStartup=false dataDir=../../data/s0_1 transient=false #Written by CorePropertiesLocator #Mon Feb 01 23:57:50 UTC 2016 name=s0build loadonStartup=false dataDir=../../data/s0_0 transient=false What determines the name of the dataDir? Is it the core name, or is the last path component of the instanceDir? If it's the core name, that won't work for me – I would need it to be the last path component on instanceDir. If the feature were set up so that I could have a core.properties file like this that would do the right thing, it would be really nice: name=s0live dataDir=s0_1
          Hide
          janhoy Jan Høydahl added a comment -

          But that would change the directory Solr looks for the core config as well, if you're using a plain old cores layout.

          My patch is aiming for the possibility to simply add -Dsolr.data.dir=/mnt/bigdisk when starting Solr (eventually bin/solr option) to indicate that all data should live at that location, regardless of where coreRoot or SOLR_HOME is, and regardless of using config sets, cloud or not. I'll whip up a better patch soon.

          Show
          janhoy Jan Høydahl added a comment - But that would change the directory Solr looks for the core config as well, if you're using a plain old cores layout. My patch is aiming for the possibility to simply add -Dsolr.data.dir=/mnt/bigdisk when starting Solr (eventually bin/solr option) to indicate that all data should live at that location, regardless of where coreRoot or SOLR_HOME is, and regardless of using config sets, cloud or not. I'll whip up a better patch soon.
          Hide
          romseygeek Alan Woodward added a comment -

          I think this can be achieved now using a combination of configsets (or SolrCloud) and setting coreRootDirectory in solr.xml?

          Show
          romseygeek Alan Woodward added a comment - I think this can be achieved now using a combination of configsets (or SolrCloud) and setting coreRootDirectory in solr.xml?
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          It would allow a cleaner config that doesn't need "../.." on the dataDir.

          It also means you don't have to mess with the data dir property on every SolrCore you create (try doing that with the collections api). Instead you just set one property and forget about it and do everything normal from then on.

          Show
          markrmiller@gmail.com Mark Miller added a comment - It would allow a cleaner config that doesn't need "../.." on the dataDir. It also means you don't have to mess with the data dir property on every SolrCore you create (try doing that with the collections api). Instead you just set one property and forget about it and do everything normal from then on.
          Hide
          janhoy Jan Høydahl added a comment -

          Attaching first rough patch to show my thinking. No unit tests and not even manually tested to work. Just want feedback if this is the right direction

          Show
          janhoy Jan Høydahl added a comment - Attaching first rough patch to show my thinking. No unit tests and not even manually tested to work. Just want feedback if this is the right direction
          Hide
          janhoy Jan Høydahl added a comment -

          Hmm, perhaps better to do this as part of DirectoryFactory#getDataHome and add a setting inside <directoryFactory> tag in solrconfig (data.home is a naming more in line with existing stuff):

          <str name="solr.data.home">${solr.data.home:}</str>
          

          This way, each directory impl can have a say in what a data root means.

          Show
          janhoy Jan Høydahl added a comment - Hmm, perhaps better to do this as part of DirectoryFactory#getDataHome and add a setting inside <directoryFactory> tag in solrconfig (data.home is a naming more in line with existing stuff): <str name= "solr.data.home" > ${solr.data.home:} </str> This way, each directory impl can have a say in what a data root means.
          Hide
          elyograg Shawn Heisey added a comment -

          I achieve separation between the instanceDir and dataDir with the following settings in solr.xml. I'm using the old format. These are not on separate filesystems, but they could be. This method allows everything to have relative paths, so the entire structure can be relocated without a lot of config changes.

              <core name="s0live" instanceDir="cores/s0_1/" loadOnStartup="true" dataDir="../../data/s0_1" transient="false"/>
          

          +1 for Jan's idea. It would allow a cleaner config that doesn't need "../.." on the dataDir.

          Show
          elyograg Shawn Heisey added a comment - I achieve separation between the instanceDir and dataDir with the following settings in solr.xml. I'm using the old format. These are not on separate filesystems, but they could be. This method allows everything to have relative paths, so the entire structure can be relocated without a lot of config changes. <core name="s0live" instanceDir="cores/s0_1/" loadOnStartup="true" dataDir="../../data/s0_1" transient="false"/> +1 for Jan's idea. It would allow a cleaner config that doesn't need "../.." on the dataDir.
          Hide
          janhoy Jan Høydahl added a comment -

          Mark Miller, if using solr.hdfs.home, should not also data from e.g. BlendedInfixSuggester be co-located there? But BlendedInfixLookupFactory currently hardcodes FSDirectory. Should probably create another JIRA for that and possibly other hardcodings.

          Show
          janhoy Jan Høydahl added a comment - Mark Miller , if using solr.hdfs.home , should not also data from e.g. BlendedInfixSuggester be co-located there? But BlendedInfixLookupFactory currently hardcodes FSDirectory. Should probably create another JIRA for that and possibly other hardcodings.
          Hide
          janhoy Jan Høydahl added a comment -

          Hoss, yes you can compose your own variables everywhere in general, but this issue proposes to ship Solr with such convenience out of the box. Then also we could add an -r <dir> option to bin/solr for specifying where data should live. Thus people having tons of collections already will be able to upgrade to Solr5 and start using the option without further editing of XML's.

          Show
          janhoy Jan Høydahl added a comment - Hoss, yes you can compose your own variables everywhere in general, but this issue proposes to ship Solr with such convenience out of the box. Then also we could add an -r <dir> option to bin/solr for specifying where data should live. Thus people having tons of collections already will be able to upgrade to Solr5 and start using the option without further editing of XML's.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          This is similar to the solr.hdfs.home that the HdfsDirectoryFactory exposes to root SolrCloud instance dirs in one location. Def makes sense to have the same option for local filesystem given that you really don't want to manage data directories manually when using SolrCloud if you can help it. That was also a driving reason behind solr.hdfs.home.

          Show
          markrmiller@gmail.com Mark Miller added a comment - This is similar to the solr.hdfs.home that the HdfsDirectoryFactory exposes to root SolrCloud instance dirs in one location. Def makes sense to have the same option for local filesystem given that you really don't want to manage data directories manually when using SolrCloud if you can help it. That was also a driving reason behind solr.hdfs.home.
          Hide
          janhoy Jan Høydahl added a comment -

          Not sure how to wire it in so it will also work as today if the new option is not specified.

          What we have now in solrconfig.xml is;

          <dataDir>${solr.data.dir:}</dataDir>

          One way is to add a new property in solr.xml:

          <dataRootDir>${solr.data.root:}</dataRootDir>

          Then modify the logic in SolrCore and other places resolving default data dir, if empty to consider solr.data.root as well.

          Show
          janhoy Jan Høydahl added a comment - Not sure how to wire it in so it will also work as today if the new option is not specified. What we have now in solrconfig.xml is; <dataDir> ${solr.data.dir:} </dataDir> One way is to add a new property in solr.xml : <dataRootDir> ${solr.data.root:} </dataRootDir> Then modify the logic in SolrCore and other places resolving default data dir, if empty to consider solr.data.root as well.
          Hide
          hossman Hoss Man added a comment -

          isn't this already trivial for users to do by sepecifying <dataDir>${solr.data.root}/${solr.core.name}/data</dataDir> in their solrconfig.xml file(s) ?

          Show
          hossman Hoss Man added a comment - isn't this already trivial for users to do by sepecifying <dataDir>${solr.data.root}/${solr.core.name}/data</dataDir> in their solrconfig.xml file(s) ?

            People

            • Assignee:
              janhoy Jan Høydahl
              Reporter:
              janhoy Jan Høydahl
            • Votes:
              2 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development