Solr
  1. Solr
  2. SOLR-658

Allow Solr to load index from arbitrary directory in dataDir

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.4
    • Component/s: None
    • Labels:
      None

      Description

      This is a requirement for java based Solr replication

      Usecase for arbitrary index directory:
      if the slave has a corrupted index and the filesystem does not allow overwriting files in use (NTFS) replication will fail. The solution is to copy the index from master to an alternate directory on slave and load indexreader/indexwriter from this alternate directory.

      1. SOLR-658-reopen-windows-fix.patch
        0.7 kB
        Shalin Shekhar Mangar
      2. SOLR-658.patch
        10 kB
        Shalin Shekhar Mangar
      3. SOLR-658.patch
        10 kB
        Shalin Shekhar Mangar
      4. SOLR-658.patch
        10 kB
        Shalin Shekhar Mangar
      5. SOLR-658.patch
        9 kB
        Akshay K. Ukey
      6. SOLR-658.patch
        9 kB
        Akshay K. Ukey
      7. SOLR-658.patch
        5 kB
        Shalin Shekhar Mangar

        Issue Links

          Activity

          Noble Paul created issue -
          Noble Paul made changes -
          Field Original Value New Value
          Link This issue blocks SOLR-561 [ SOLR-561 ]
          Hide
          Noble Paul added a comment -

          Implementation

          • keep a file index.properties in the data dir
          • Have an entry index=<new.index> in that file
          • This file may also keep version
          • When a new indexsearcher/writer is loaded, read this property and try to load the index from that folder
          • if it is absent , default to the hardcoded value for index and latest commitpoint
          Show
          Noble Paul added a comment - Implementation keep a file index.properties in the data dir Have an entry index=<new.index> in that file This file may also keep version When a new indexsearcher/writer is loaded, read this property and try to load the index from that folder if it is absent , default to the hardcoded value for index and latest commitpoint
          Shalin Shekhar Mangar made changes -
          Assignee Shalin Shekhar Mangar [ shalinmangar ]
          Shalin Shekhar Mangar made changes -
          Fix Version/s 1.4 [ 12313351 ]
          Affects Version/s 1.4 [ 12313351 ]
          Affects Version/s 1.3 [ 12312486 ]
          Hide
          Shalin Shekhar Mangar added a comment -

          This is cut out of the SOLR-561 patch supports loading index from an arbitrary directory.

          Changes

          1. A new method SolrCore#getNewIndexDir() is introduced which tries to read the latest indexDir from index.properties file. If that file is not present the default value (dataDir + "index/")
          2. SolrIndexSearcher now stores the path (indexDir) on which it is opened and has a getter for it.
          3. When SolrCore#getIndexDir() is called, it gives the current searcher's index directory, failing which the default value is given
          4. SolrIndexSearcher is always created with getNewIndexDir() and UpdateHandler also uses getNewIndexDir() to open IndexWriter instances.

          TODO:

          • Add a test
          • Add feature for loading arbitrary commit point.
          Show
          Shalin Shekhar Mangar added a comment - This is cut out of the SOLR-561 patch supports loading index from an arbitrary directory. Changes A new method SolrCore#getNewIndexDir() is introduced which tries to read the latest indexDir from index.properties file. If that file is not present the default value (dataDir + "index/") SolrIndexSearcher now stores the path (indexDir) on which it is opened and has a getter for it. When SolrCore#getIndexDir() is called, it gives the current searcher's index directory, failing which the default value is given SolrIndexSearcher is always created with getNewIndexDir() and UpdateHandler also uses getNewIndexDir() to open IndexWriter instances. TODO: Add a test Add feature for loading arbitrary commit point.
          Shalin Shekhar Mangar made changes -
          Attachment SOLR-658.patch [ 12388980 ]
          Hide
          Akshay K. Ukey added a comment -

          Patch in sync with trunk and with a test case (loading arbitrary commit point feature not supported in this patch).

          Show
          Akshay K. Ukey added a comment - Patch in sync with trunk and with a test case (loading arbitrary commit point feature not supported in this patch).
          Akshay K. Ukey made changes -
          Attachment SOLR-658.patch [ 12390823 ]
          Hide
          Akshay K. Ukey added a comment -

          Patch in sync with the trunk.

          Show
          Akshay K. Ukey added a comment - Patch in sync with the trunk.
          Akshay K. Ukey made changes -
          Attachment SOLR-658.patch [ 12391534 ]
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Akshay.

          Updated patch which calls getNewIndexDir before calling IndexReader#reopen so that if the new index directory is different from the old index directory, we always create a new SolrIndexSearcher with the new index directory.

          I'd like to commit this in the next two or three days if there are no objections.

          Show
          Shalin Shekhar Mangar added a comment - Thanks Akshay. Updated patch which calls getNewIndexDir before calling IndexReader#reopen so that if the new index directory is different from the old index directory, we always create a new SolrIndexSearcher with the new index directory. I'd like to commit this in the next two or three days if there are no objections.
          Shalin Shekhar Mangar made changes -
          Attachment SOLR-658.patch [ 12391717 ]
          Hide
          Shalin Shekhar Mangar added a comment -

          Updated with a bug fix:

          if (result != null && result.trim().length() > 0) {
                  File tmp = new File(dataDir + s);
                  if (tmp.exists() && tmp.isDirectory())
                    result = dataDir + s;
                }
          

          should be:

          if (s != null && s.trim().length() > 0) {
                  File tmp = new File(dataDir + s);
                  if (tmp.exists() && tmp.isDirectory())
                    result = dataDir + s;
                }
          

          I'll commit shortly.

          Show
          Shalin Shekhar Mangar added a comment - Updated with a bug fix: if (result != null && result.trim().length() > 0) { File tmp = new File(dataDir + s); if (tmp.exists() && tmp.isDirectory()) result = dataDir + s; } should be: if (s != null && s.trim().length() > 0) { File tmp = new File(dataDir + s); if (tmp.exists() && tmp.isDirectory()) result = dataDir + s; } I'll commit shortly.
          Shalin Shekhar Mangar made changes -
          Attachment SOLR-658.patch [ 12391933 ]
          Hide
          Shalin Shekhar Mangar added a comment -

          Instead of comparing path strings, we should compare the corresponding File objects to handle relative and absolute paths correctly.

          Patch to cover the above case.

          Show
          Shalin Shekhar Mangar added a comment - Instead of comparing path strings, we should compare the corresponding File objects to handle relative and absolute paths correctly. Patch to cover the above case.
          Shalin Shekhar Mangar made changes -
          Attachment SOLR-658.patch [ 12391990 ]
          Hide
          Shalin Shekhar Mangar added a comment -

          Removing reference to rollbacks and commit points which is being handled in SOLR-670

          Show
          Shalin Shekhar Mangar added a comment - Removing reference to rollbacks and commit points which is being handled in SOLR-670
          Shalin Shekhar Mangar made changes -
          Summary Allow Solr to load index from arbitrary directory in dataDir and Commit point Allow Solr to load index from arbitrary directory in dataDir
          Description This is a requirement for java based Solr replication

          Usecase for arbitrary index directory:
          if the slave has a corrupted index and the filesystem does not allow overwriting files in use (NTFS) replication will fail. The solution is to copy the index from master to an alternate directory on slave and load indexreader/indexwriter from this alternate directory.

          Usecase for arbitrary commitpoint :
          Replication can also provide rollback feature . The rollback should be able to mention a comitpoint /generation so that rollback is possible.



          This is a requirement for java based Solr replication

          Usecase for arbitrary index directory:
          if the slave has a corrupted index and the filesystem does not allow overwriting files in use (NTFS) replication will fail. The solution is to copy the index from master to an alternate directory on slave and load indexreader/indexwriter from this alternate directory.

          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 703981.

          Thanks Noble and Akshay!

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 703981. Thanks Noble and Akshay!
          Shalin Shekhar Mangar made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Yonik Seeley added a comment -

          This causes reopen() to never be used on Windows because the following condition comes up false:

               if(new File(getIndexDir()).equals(new File(newIndexDir)))  {
          
          Show
          Yonik Seeley added a comment - This causes reopen() to never be used on Windows because the following condition comes up false: if ( new File(getIndexDir()).equals( new File(newIndexDir))) {
          Yonik Seeley made changes -
          Status Resolved [ 5 ] Reopened [ 4 ]
          Resolution Fixed [ 1 ]
          Hide
          Shalin Shekhar Mangar added a comment -

          Copying over from the solr-dev thread on failing tests:

          The first problem is that File.equals compares only the path and not the absolute path. A work around is to compare absolute path ourselves. But a bigger problem is with the canonical paths where long directory names is uppercased and shortened into 8 character names (e.g. "C:\Documents and Settings" becomes "C:\DOCUME~1").

          The test fails because we use java.io.tmpdir which defaults to user's home directory (shortened and canonicalized) on windows and comparison on this path fails. What I'm not able to figure out yet is why does Slave Jetty, running on this canonical path, returns the full path of the index directory.

          Slave's SolrCore.getIndexDir gives:
          C:\Documents and Settings\shalinsmangar\Local Settings\Temp\org.apache.solr.handler.TestReplicationHandler$SolrInstance-1233681533000master\data\index

          The value written by TestReplicationHandler is:
          C:\DOCUME~1\SHALIN~1\LOCALS~1\Temp\org.apache.solr.handler.TestReplicationHandler$SolrInstance-1233681533000master\data\index

          Show
          Shalin Shekhar Mangar added a comment - Copying over from the solr-dev thread on failing tests: The first problem is that File.equals compares only the path and not the absolute path. A work around is to compare absolute path ourselves. But a bigger problem is with the canonical paths where long directory names is uppercased and shortened into 8 character names (e.g. "C:\Documents and Settings" becomes "C:\DOCUME~1"). The test fails because we use java.io.tmpdir which defaults to user's home directory (shortened and canonicalized) on windows and comparison on this path fails. What I'm not able to figure out yet is why does Slave Jetty, running on this canonical path, returns the full path of the index directory. Slave's SolrCore.getIndexDir gives: C:\Documents and Settings\shalinsmangar\Local Settings\Temp\org.apache.solr.handler.TestReplicationHandler$SolrInstance-1233681533000master\data\index The value written by TestReplicationHandler is: C:\DOCUME~1\SHALIN~1\LOCALS~1\Temp\org.apache.solr.handler.TestReplicationHandler$SolrInstance-1233681533000master\data\index
          Hide
          Shalin Shekhar Mangar added a comment -

          I should read javadocs more. This patch compares the index directories using their canonical paths This fixes the problem on windows.

          Show
          Shalin Shekhar Mangar added a comment - I should read javadocs more. This patch compares the index directories using their canonical paths This fixes the problem on windows.
          Shalin Shekhar Mangar made changes -
          Attachment SOLR-658-reopen-windows-fix.patch [ 12403847 ]
          shalin committed 759641 (1 file)
          Reviews: none

          SOLR-658 -- Compare the index directories using their canonical paths

          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 759641.

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 759641.
          Shalin Shekhar Mangar made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Grant Ingersoll added a comment -

          Bulk close for Solr 1.4

          Show
          Grant Ingersoll added a comment - Bulk close for Solr 1.4
          Grant Ingersoll made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Noble Paul
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development