Solr
  1. Solr
  2. SOLR-561

Solr replication by Solr (for windows also)

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 1.4
    • Component/s: replication (java)
    • Labels:
      None
    • Environment:

      All

      Description

      The current replication strategy in solr involves shell scripts . The following are the drawbacks with the approach

      • It does not work with windows
      • Replication works as a separate piece not integrated with solr.
      • Cannot control replication from solr admin/JMX
      • Each operation requires manual telnet to the host

      Doing the replication in java has the following advantages

      • Platform independence
      • Manual steps can be completely eliminated. Everything can be driven from solrconfig.xml .
        • Adding the url of the master in the slaves should be good enough to enable replication. Other things like frequency of
          snapshoot/snappull can also be configured . All other information can be automatically obtained.
      • Start/stop can be triggered from solr/admin or JMX
      • Can get the status/progress while replication is going on. It can also abort an ongoing replication
      • No need to have a login into the machine
      • From a development perspective, we can unit test it

      This issue can track the implementation of solr replication in java

      1. SOLR-561-full.patch
        75 kB
        Shalin Shekhar Mangar
      2. SOLR-561-full.patch
        107 kB
        Akshay K. Ukey
      3. SOLR-561-full.patch
        97 kB
        Akshay K. Ukey
      4. SOLR-561-full.patch
        94 kB
        Akshay K. Ukey
      5. SOLR-561-fixes.patch
        5 kB
        Yonik Seeley
      6. SOLR-561-fixes.patch
        6 kB
        Yonik Seeley
      7. SOLR-561-fixes.patch
        7 kB
        Yonik Seeley
      8. SOLR-561-core.patch
        32 kB
        Noble Paul
      9. SOLR-561.patch
        41 kB
        Noble Paul
      10. SOLR-561.patch
        47 kB
        Noble Paul
      11. SOLR-561.patch
        79 kB
        Noble Paul
      12. SOLR-561.patch
        82 kB
        Noble Paul
      13. SOLR-561.patch
        78 kB
        Noble Paul
      14. SOLR-561.patch
        51 kB
        Shalin Shekhar Mangar
      15. SOLR-561.patch
        171 kB
        Akshay K. Ukey
      16. SOLR-561.patch
        154 kB
        Shalin Shekhar Mangar
      17. SOLR-561.patch
        172 kB
        Akshay K. Ukey
      18. SOLR-561.patch
        170 kB
        Shalin Shekhar Mangar
      19. SOLR-561.patch
        172 kB
        Akshay K. Ukey
      20. SOLR-561.patch
        11 kB
        Noble Paul
      21. SOLR-561.patch
        0.6 kB
        Noble Paul
      22. SOLR-561.patch
        19 kB
        Noble Paul
      23. deletion_policy.patch
        14 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          Grant Ingersoll added a comment -

          Bulk close for Solr 1.4

          Show
          Grant Ingersoll added a comment - Bulk close for Solr 1.4
          Hide
          Koji Sekiguchi added a comment -

          change component from scripts to java

          Show
          Koji Sekiguchi added a comment - change component from scripts to java
          Hide
          Noble Paul added a comment -

          The default pollInterval can behave the vway you want (so that the fetches are synchronized in time by the clock). Raise a separate issue and we can fix it

          Show
          Noble Paul added a comment - The default pollInterval can behave the vway you want (so that the fetches are synchronized in time by the clock). Raise a separate issue and we can fix it
          Hide
          Bill Bell added a comment -

          I am not a huge fan of PollInterval. It would be great to add an option to get the Index based on exact time: PollTime="*/15 * * * *" That would run at every 15 minutes based on the clock. i.e. 1:00pm, 1:15pm, 1:30pm, 1:45pm, etc. All my slaves are sync'd using NTP, so this would work better. Since each slave starts differently, we cannot set the PollInterval="00:15:00" since they would get different indexes based on when they start. The other option would be to suspend polling - and start - which would be very manual I guess. Setting the PollInterval to 10 seconds would be getting a new index when the old one is still warming up. Even 10 seconds interval would not be good, since we get so many updates, each server would have different indexes. With Snap we don't have this issue.

          We get SOLR updates frequently and since they are large we cannot wait to do a commit at the 15 minute mark using cron. Optimize just takes too long.

          On our system we need to limit how often the slaves get the new index. We would like all slaves to get the index at the same time.

          Bill

          Show
          Bill Bell added a comment - I am not a huge fan of PollInterval. It would be great to add an option to get the Index based on exact time: PollTime="*/15 * * * *" That would run at every 15 minutes based on the clock. i.e. 1:00pm, 1:15pm, 1:30pm, 1:45pm, etc. All my slaves are sync'd using NTP, so this would work better. Since each slave starts differently, we cannot set the PollInterval="00:15:00" since they would get different indexes based on when they start. The other option would be to suspend polling - and start - which would be very manual I guess. Setting the PollInterval to 10 seconds would be getting a new index when the old one is still warming up. Even 10 seconds interval would not be good, since we get so many updates, each server would have different indexes. With Snap we don't have this issue. We get SOLR updates frequently and since they are large we cannot wait to do a commit at the 15 minute mark using cron. Optimize just takes too long. On our system we need to limit how often the slaves get the new index. We would like all slaves to get the index at the same time. Bill
          Hide
          Otis Gospodnetic added a comment -

          I wonder if it might be useful to add copy throttle support to the replication. See SOLR-849 and the referenced email thread.

          Show
          Otis Gospodnetic added a comment - I wonder if it might be useful to add copy throttle support to the replication. See SOLR-849 and the referenced email thread.
          Hide
          Noble Paul added a comment -

          We need to cleanup the SnapShooter. it was given low priority because
          snapshoot is not at all necessary in the new replication
          implementation. It is only useful for periodic backups


          --Noble Paul

          Show
          Noble Paul added a comment - We need to cleanup the SnapShooter. it was given low priority because snapshoot is not at all necessary in the new replication implementation. It is only useful for periodic backups – --Noble Paul
          Hide
          Patrick Eger added a comment -

          Gotcha, i will focus efforts elsewhere then

          Show
          Patrick Eger added a comment - Gotcha, i will focus efforts elsewhere then
          Hide
          Yonik Seeley added a comment -

          Am i crazy or are these real problems?

          Right, as Noble & I noted, there are still known problems with SnapShooter. Luckily, it's not necessary in the current replication scheme which no longer relies on snapshots.

          Show
          Yonik Seeley added a comment - Am i crazy or are these real problems? Right, as Noble & I noted, there are still known problems with SnapShooter. Luckily, it's not necessary in the current replication scheme which no longer relies on snapshots.
          Hide
          Patrick Eger added a comment -

          Hi, i have a couple comments about the implementation, specifically SnapShooter.java just pulled from TRUNK:

          -------------------------------------
          createSnapshot() uses the following pattern, which seems unreliable to me, under the prospect of concurrent snapshot requests:

          lockFile = new File(snapDir, directoryName + ".lock");
          if (lockFile.exists())

          { return; }

          ... <1> ...

          lockFile.createNewFile();

          ... <2> ...

          if (lockFile != null)

          { lockFile.delete(); }

          AFAIK, java.nio.channels.FileLock should be used for any type of file-based locking of the sort for cross-vm synchronization. If you are worried about in-vm synchronization, it might be best to just use j.u.c Locks or synchronized{} blocks. This would remove the possiblity of junk .lock files if, say the VM dies during <2>.

          -------------------------------------
          Additionally, these lines seem suspect to me. transferTo() needs to be done in a loop for the full copy to work.

          fis = new FileInputStream(file);
          File destFile = new File(toDir, file.getName());
          fos = new FileOutputStream(destFile);
          fis.getChannel().transferTo(0, fis.available(), fos.getChannel());
          destFile.setLastModified(file.lastModified());

          Am i crazy or are these real problems?

          Show
          Patrick Eger added a comment - Hi, i have a couple comments about the implementation, specifically SnapShooter.java just pulled from TRUNK: ------------------------------------- createSnapshot() uses the following pattern, which seems unreliable to me, under the prospect of concurrent snapshot requests: lockFile = new File(snapDir, directoryName + ".lock"); if (lockFile.exists()) { return; } ... <1> ... lockFile.createNewFile(); ... <2> ... if (lockFile != null) { lockFile.delete(); } AFAIK, java.nio.channels.FileLock should be used for any type of file-based locking of the sort for cross-vm synchronization. If you are worried about in-vm synchronization, it might be best to just use j.u.c Locks or synchronized{} blocks. This would remove the possiblity of junk .lock files if, say the VM dies during <2>. ------------------------------------- Additionally, these lines seem suspect to me. transferTo() needs to be done in a loop for the full copy to work. fis = new FileInputStream(file); File destFile = new File(toDir, file.getName()); fos = new FileOutputStream(destFile); fis.getChannel().transferTo(0, fis.available(), fos.getChannel()); destFile.setLastModified(file.lastModified()); Am i crazy or are these real problems?
          Hide
          Yonik Seeley added a comment -

          Comments committed. Thanks!

          Show
          Yonik Seeley added a comment - Comments committed. Thanks!
          Hide
          Noble Paul added a comment -

          comments only

          Show
          Noble Paul added a comment - comments only
          Hide
          Noble Paul added a comment -

          Yonik. If you can commit this patch I can give a patch with comments . The code badly needs some comments.

          Show
          Noble Paul added a comment - Yonik. If you can commit this patch I can give a patch with comments . The code badly needs some comments.
          Hide
          Noble Paul added a comment - - edited

          The SnapShooter is not written right (thread safety). soon after you commit the patch , I can give a patch . After I fix it I can update the wiki w/ proper documentation.

          Snapshoot is not a very important feature in the current scheme of things. It is useful only if somebody wants to do periodic backups

          Should we try OS specific copy?
          Hardlinks can be used in *nix and Windows can also do hardlinks if fsutils is present. If not ,we can do a proper copy

          Show
          Noble Paul added a comment - - edited The SnapShooter is not written right (thread safety). soon after you commit the patch , I can give a patch . After I fix it I can update the wiki w/ proper documentation. Snapshoot is not a very important feature in the current scheme of things. It is useful only if somebody wants to do periodic backups Should we try OS specific copy? Hardlinks can be used in *nix and Windows can also do hardlinks if fsutils is present. If not ,we can do a proper copy
          Hide
          Noble Paul added a comment -

          what is ReplicationHandler.getIndexVersion() supposed to return, and why?

          This is the method called by the slaves. they must only see the current "replicatable" index version. For instance, if the 'replicateAfter' is set to 'optimize' then the slave should not see the index version that is a commit.

          The getDetails() ( command=details) method gives the actual current index version

          I think there's an issue with SnapShooter in that it never does any reservations for the commit point it's trying to copy.

          right, Snaphsooter has to reserve.

          The new setReserveDuration() looks right.

          Show
          Noble Paul added a comment - what is ReplicationHandler.getIndexVersion() supposed to return, and why? This is the method called by the slaves. they must only see the current "replicatable" index version. For instance, if the 'replicateAfter' is set to 'optimize' then the slave should not see the index version that is a commit. The getDetails() ( command=details) method gives the actual current index version I think there's an issue with SnapShooter in that it never does any reservations for the commit point it's trying to copy. right, Snaphsooter has to reserve. The new setReserveDuration() looks right.
          Hide
          Yonik Seeley added a comment -

          Here's an update to the "fixes" patch that fixes an issue with setReserveDuration when called with different reserveTimes. Previously, the new value overwrites the old, regardless of it's value. The approach to fix is a basic spin loop (see below). Anyone see issues with this approach?

            public void setReserveDuration(Long indexVersion, long reserveTime) {
              long timeToSet = System.currentTimeMillis() + reserveTime;
              for(;;) {
                Long previousTime = reserves.put(indexVersion, timeToSet);
          
                // this is the common success case: the older time didn't exist, or
                // came before the new time.
                if (previousTime == null || previousTime <= timeToSet) break;
          
                // At this point, we overwrote a longer reservation, so we want to restore the older one.
                // the problem is that an even longer reservation may come in concurrently
                // and we don't want to overwrite that one too.  We simply keep retrying in a loop
                // with the maximum time value we have seen.
                timeToSet = previousTime;      
              }
            }
          

          I think this is also a great example of where comments explaining how things work are really needed.

          Show
          Yonik Seeley added a comment - Here's an update to the "fixes" patch that fixes an issue with setReserveDuration when called with different reserveTimes. Previously, the new value overwrites the old, regardless of it's value. The approach to fix is a basic spin loop (see below). Anyone see issues with this approach? public void setReserveDuration( Long indexVersion, long reserveTime) { long timeToSet = System .currentTimeMillis() + reserveTime; for (;;) { Long previousTime = reserves.put(indexVersion, timeToSet); // this is the common success case : the older time didn't exist, or // came before the new time. if (previousTime == null || previousTime <= timeToSet) break ; // At this point, we overwrote a longer reservation, so we want to restore the older one. // the problem is that an even longer reservation may come in concurrently // and we don't want to overwrite that one too. We simply keep retrying in a loop // with the maximum time value we have seen. timeToSet = previousTime; } } I think this is also a great example of where comments explaining how things work are really needed.
          Hide
          Yonik Seeley added a comment -

          I think there's an issue with SnapShooter in that it never does any reservations for the commit point it's trying to copy.

          Show
          Yonik Seeley added a comment - I think there's an issue with SnapShooter in that it never does any reservations for the commit point it's trying to copy.
          Hide
          Yonik Seeley added a comment -

          Updated the fixes patch with more thread safety fixes.

          Q: what is ReplicationHandler.getIndexVersion() supposed to return, and why? It currently returns the version of the visible index (registered). Should it be the most recent version of the index we have? Any reason it isn't using ReplicationHandler.indexCommitPoint?

          Also, I think we should all work at adding more comments to code as it is written. Lack of comments made this patch harder to review.

          Show
          Yonik Seeley added a comment - Updated the fixes patch with more thread safety fixes. Q: what is ReplicationHandler.getIndexVersion() supposed to return, and why? It currently returns the version of the visible index (registered). Should it be the most recent version of the index we have? Any reason it isn't using ReplicationHandler.indexCommitPoint? Also, I think we should all work at adding more comments to code as it is written. Lack of comments made this patch harder to review.
          Hide
          Yonik Seeley added a comment -

          Attaching some little thread safety fixes (mostly adding volatile to values modified and read from different threads).

          Show
          Yonik Seeley added a comment - Attaching some little thread safety fixes (mostly adding volatile to values modified and read from different threads).
          Hide
          Noble Paul added a comment -

          silly me. The packetsWriitten variable was not incremented

          Show
          Noble Paul added a comment - silly me. The packetsWriitten variable was not incremented
          Hide
          Yonik Seeley added a comment -

          Committed with 2 changes:

          • getSearcher() isn't allowed in inform() so I changed to getNewestSearcher()
          • changed cleanReserves() to not collect ids to delete in a separate list (not needed for ConcurrentHashMap)
          Show
          Yonik Seeley added a comment - Committed with 2 changes: getSearcher() isn't allowed in inform() so I changed to getNewestSearcher() changed cleanReserves() to not collect ids to delete in a separate list (not needed for ConcurrentHashMap)
          Hide
          Yonik Seeley added a comment -

          Thanks Noble, reviewing now...

          Show
          Yonik Seeley added a comment - Thanks Noble, reviewing now...
          Hide
          Noble Paul added a comment - - edited

          patch contains changes for reserve being set for 10secs by default after every 5 packets (5 MB) are written.

          The commitReserveDuration is now supposed to be a small value (default is 10 secs). If the network is particularly slow user can tweak it to set a bigger number.

          every command for fetching file content has an extra attribute indexversion , so that the master now knows which IndexCommit is being downloaded.

          Show
          Noble Paul added a comment - - edited patch contains changes for reserve being set for 10secs by default after every 5 packets (5 MB) are written. The commitReserveDuration is now supposed to be a small value (default is 10 secs). If the network is particularly slow user can tweak it to set a bigger number. every command for fetching file content has an extra attribute indexversion , so that the master now knows which IndexCommit is being downloaded.
          Hide
          Akshay K. Ukey added a comment -

          Cool. Hopefully one of the test indexes contain a single file greater than 4G to test that we don't hit any 32 bit overflow in the stack. If not, re-doing your wikipedia test with compound index format and after an optimize should do the trick.

          Yes, one of the files in the index is of size 6.3G, created on optimize.

          Show
          Akshay K. Ukey added a comment - Cool. Hopefully one of the test indexes contain a single file greater than 4G to test that we don't hit any 32 bit overflow in the stack. If not, re-doing your wikipedia test with compound index format and after an optimize should do the trick. Yes, one of the files in the index is of size 6.3G, created on optimize.
          Hide
          Yonik Seeley added a comment -

          Since files are transferred in one go, the master knows about the access time but it does not know if the transfer has ended so the lease may expire in between the transfer leading to a failure.

          Right, as Noble pointed out, lease extension will need to be done periodically during the download (every N blocks written to the socket).

          We'll need to track the transfers individually as well.

          Each file request can optionally specify the commit point it is copying.

          If the slave dies in between the transfer, we'll need to track that as well and time-out the lease appropriately.

          The lease is just the current reservation mechanism, but called more often and with a very short reservation (on the order of seconds, not minutes I would think), so I don't see a need to time them out.

          We have been testing it with a large index (wikipedia articles, around 7-8GB index on disk)

          Cool. Hopefully one of the test indexes contain a single file greater than 4G to test that we don't hit any 32 bit overflow in the stack. If not, re-doing your wikipedia test with compound index format and after an optimize should do the trick.

          Show
          Yonik Seeley added a comment - Since files are transferred in one go, the master knows about the access time but it does not know if the transfer has ended so the lease may expire in between the transfer leading to a failure. Right, as Noble pointed out, lease extension will need to be done periodically during the download (every N blocks written to the socket). We'll need to track the transfers individually as well. Each file request can optionally specify the commit point it is copying. If the slave dies in between the transfer, we'll need to track that as well and time-out the lease appropriately. The lease is just the current reservation mechanism, but called more often and with a very short reservation (on the order of seconds, not minutes I would think), so I don't see a need to time them out. We have been testing it with a large index (wikipedia articles, around 7-8GB index on disk) Cool. Hopefully one of the test indexes contain a single file greater than 4G to test that we don't hit any 32 bit overflow in the stack. If not, re-doing your wikipedia test with compound index format and after an optimize should do the trick.
          Hide
          Yonik Seeley added a comment -

          The SnapPuller calls commit with waitSearcher=true, so the call will wait for the searcher to get registered and warmed.

          A commit could come from somewhere else though, or we could be starting up and no searcher is yet registered. It's always safe (and clearer) to just use the newest reader opened, right?

          Is there a reason that SnapPuller waits for the new searcher to be registered?
          If not, I'll change this... I'm currently creating a patch with some little thread-safety fixes.

          Show
          Yonik Seeley added a comment - The SnapPuller calls commit with waitSearcher=true, so the call will wait for the searcher to get registered and warmed. A commit could come from somewhere else though, or we could be starting up and no searcher is yet registered. It's always safe (and clearer) to just use the newest reader opened, right? Is there a reason that SnapPuller waits for the new searcher to be registered? If not, I'll change this... I'm currently creating a patch with some little thread-safety fixes.
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks for going through this Yonik.

          Snappuller should use getNewestSearcher() rather than getSearcher() to avoid pulling the same snapshot more than once if warming takes a long time.

          The SnapPuller calls commit with waitSearcher=true, so the call will wait for the searcher to get registered and warmed. The reentrant lock in SnapPuller will be released only after the commit call returns. So it should be OK, right?

          It seems like renewing a lease (a short term reservation) whenever an access is done would solve both of these problems (and is what I initially had in mind). All requests should indicate what commit point is being copied so that the lease can be extended.

          Since files are transferred in one go, the master knows about the access time but it does not know if the transfer has ended so the lease may expire in between the transfer leading to a failure. We'll need to track the transfers individually as well. If the slave dies in between the transfer, we'll need to track that as well and time-out the lease appropriately. If I compare the state of things to the old way of replication, not sure if this feature is worth the effort. What do you think?

          Has anyone tested this with a large files (say 5G or more)...

          We have been testing it with a large index (wikipedia articles, around 7-8GB index on disk) with Tomcat across networks (transfer rate between servers is around 700-800 KB/sec). We haven't seen any problem yet. We'll continue to test this with Tomcat and other containers and report performance numbers and problems, if any.

          Show
          Shalin Shekhar Mangar added a comment - Thanks for going through this Yonik. Snappuller should use getNewestSearcher() rather than getSearcher() to avoid pulling the same snapshot more than once if warming takes a long time. The SnapPuller calls commit with waitSearcher=true, so the call will wait for the searcher to get registered and warmed. The reentrant lock in SnapPuller will be released only after the commit call returns. So it should be OK, right? It seems like renewing a lease (a short term reservation) whenever an access is done would solve both of these problems (and is what I initially had in mind). All requests should indicate what commit point is being copied so that the lease can be extended. Since files are transferred in one go, the master knows about the access time but it does not know if the transfer has ended so the lease may expire in between the transfer leading to a failure. We'll need to track the transfers individually as well. If the slave dies in between the transfer, we'll need to track that as well and time-out the lease appropriately. If I compare the state of things to the old way of replication, not sure if this feature is worth the effort. What do you think? Has anyone tested this with a large files (say 5G or more)... We have been testing it with a large index (wikipedia articles, around 7-8GB index on disk) with Tomcat across networks (transfer rate between servers is around 700-800 KB/sec). We haven't seen any problem yet. We'll continue to test this with Tomcat and other containers and report performance numbers and problems, if any.
          Hide
          Noble Paul added a comment -

          All requests should indicate what commit point is being copied so that the lease can be extended.

          This is a good idea . But when the index is large it tends to have 1 very large file and a few other smaller files. It is that very large file that takes a lot of time(In our case a 6GB file across data centers took around 2 hrs) So we may also need to do call reserve even while download is going on.

          Show
          Noble Paul added a comment - All requests should indicate what commit point is being copied so that the lease can be extended. This is a good idea . But when the index is large it tends to have 1 very large file and a few other smaller files. It is that very large file that takes a lot of time(In our case a 6GB file across data centers took around 2 hrs) So we may also need to do call reserve even while download is going on.
          Hide
          Noble Paul added a comment -
          • we did extensive testing with very large index around 7GB with retries also (for failed connections)
          • Test was conducted on both on jetty and tomcat
          • Performance of new replication~= rsync based replication. The replication speed is largely network IO bound

          the servlet container can handle sending responses of that size

          The servlet container usually have a small chunk size by default(~8KB(in tomcat). It keeps flushing the stream after that size is crossed.

          Show
          Noble Paul added a comment - we did extensive testing with very large index around 7GB with retries also (for failed connections) Test was conducted on both on jetty and tomcat Performance of new replication~= rsync based replication. The replication speed is largely network IO bound the servlet container can handle sending responses of that size The servlet container usually have a small chunk size by default(~8KB(in tomcat). It keeps flushing the stream after that size is crossed.
          Hide
          Yonik Seeley added a comment -

          Files are downloaded in one HTTP request... the response is read and written one chunk at a time. Has anyone tested this with a large files (say 5G or more) to ensure that:

          • the response is correctly streamed (not buffered) as it is written to the socket?
          • the servlet container can handle sending responses of that size
          • the servlet container won't time out (test big file over slow connection)
          • the client side (HTTPClient) doesn't buffer the response, can handle the big size, and won't time out.

          The first 3 go through servlet container code and thus should probably be tested with tomcat, jetty, and resin.

          Show
          Yonik Seeley added a comment - Files are downloaded in one HTTP request... the response is read and written one chunk at a time. Has anyone tested this with a large files (say 5G or more) to ensure that: the response is correctly streamed (not buffered) as it is written to the socket? the servlet container can handle sending responses of that size the servlet container won't time out (test big file over slow connection) the client side (HTTPClient) doesn't buffer the response, can handle the big size, and won't time out. The first 3 go through servlet container code and thus should probably be tested with tomcat, jetty, and resin.
          Hide
          Yonik Seeley added a comment -

          I didn't catch earlier how reservations were done: currently, the commit point is reserved for a certain time when the file list is initially fetched. This requires that the user estimate how long a snap pull will last, and if they get it wrong things will fail. On the other side, setting the time high requires more free disk space.

          It seems like renewing a lease (a short term reservation) whenever an access is done would solve both of these problems (and is what I initially had in mind). All requests should indicate what commit point is being copied so that the lease can be extended.

          Show
          Yonik Seeley added a comment - I didn't catch earlier how reservations were done: currently, the commit point is reserved for a certain time when the file list is initially fetched. This requires that the user estimate how long a snap pull will last, and if they get it wrong things will fail. On the other side, setting the time high requires more free disk space. It seems like renewing a lease (a short term reservation) whenever an access is done would solve both of these problems (and is what I initially had in mind). All requests should indicate what commit point is being copied so that the lease can be extended.
          Hide
          Yonik Seeley added a comment -

          Snappuller should use getNewestSearcher() rather than getSearcher() to avoid pulling the same snapshot more than once if warming takes a long time.

          Show
          Yonik Seeley added a comment - Snappuller should use getNewestSearcher() rather than getSearcher() to avoid pulling the same snapshot more than once if warming takes a long time.
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 706565.

          Thanks Noble, Yonik and Akshay!

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 706565. Thanks Noble, Yonik and Akshay!
          Hide
          Akshay K. Ukey added a comment -

          Again a minor fix in replication admin page

          Show
          Akshay K. Ukey added a comment - Again a minor fix in replication admin page
          Hide
          Shalin Shekhar Mangar added a comment -

          Another iteration over Akshay's patch.

          1. Made the collections used for keeping statistics synchronized to avoid concurrent modification exceptions.
          2. Removed @author tags and put @version and @since 1.4 tags
          Show
          Shalin Shekhar Mangar added a comment - Another iteration over Akshay's patch. Made the collections used for keeping statistics synchronized to avoid concurrent modification exceptions. Removed @author tags and put @version and @since 1.4 tags
          Hide
          Akshay K. Ukey added a comment -

          Patch with minor fixes related to the admin page.

          Show
          Akshay K. Ukey added a comment - Patch with minor fixes related to the admin page.
          Hide
          Shalin Shekhar Mangar added a comment -

          Updated patch with a couple of bug fixes related to closing connections and refcounted index searcher. Other cosmetic changes include code formatting and javadocs.

          Noble has put up a wiki page at http://wiki.apache.org/solr/SolrReplication detailing the features and configuration.

          Show
          Shalin Shekhar Mangar added a comment - Updated patch with a couple of bug fixes related to closing connections and refcounted index searcher. Other cosmetic changes include code formatting and javadocs. Noble has put up a wiki page at http://wiki.apache.org/solr/SolrReplication detailing the features and configuration.
          Hide
          Shalin Shekhar Mangar added a comment -

          Thanks Akshay.

          On a first glance, this is looking really good. I am planning to commit this in a few days. We can take up the enhancements or bug fixes through new issues.

          Show
          Shalin Shekhar Mangar added a comment - Thanks Akshay. On a first glance, this is looking really good. I am planning to commit this in a few days. We can take up the enhancements or bug fixes through new issues.
          Hide
          Akshay K. Ukey added a comment -

          Patch with following changes:

          1. segments_ file moved in the end.
          2. Some minor changes in replication admin page
          3. Test case for index and config files replication.
          4. Some minor bug fixes.
          Show
          Akshay K. Ukey added a comment - Patch with following changes: segments_ file moved in the end. Some minor changes in replication admin page Test case for index and config files replication. Some minor bug fixes.
          Hide
          Noble Paul added a comment -

          If the files are not part of any indexcommit (this is true if the segments_n file didn't get downloaded) will it still clean it up?. And when solr restarts ReplicationHandler will have difficulty in cleaning up those files if replication kicks off before Lucene cleans it up (If it actually does that)

          Show
          Noble Paul added a comment - If the files are not part of any indexcommit (this is true if the segments_n file didn't get downloaded) will it still clean it up?. And when solr restarts ReplicationHandler will have difficulty in cleaning up those files if replication kicks off before Lucene cleans it up (If it actually does that)
          Hide
          Yonik Seeley added a comment -

          If Solr crashes while downloading that will leave unnecessary/incomplete files in the index directory.

          If we don't want to try and pick up from where we left off, it seems like Lucene's deletion policy can clean up old index files that are unreferenced.

          Show
          Yonik Seeley added a comment - If Solr crashes while downloading that will leave unnecessary/incomplete files in the index directory. If we don't want to try and pick up from where we left off, it seems like Lucene's deletion policy can clean up old index files that are unreferenced.
          Hide
          Noble Paul added a comment -

          Why are files downloaded to a temp directory first?

          If Solr crashes while downloading that will leave unnecessary/incomplete files in the index directory. We did not want the index directory to be polluted. The files are 'moved' to index directory after they are downloaded .

          The segments_n file is copied in the end. from temp directory to index directory.
          (OK , that patch is coming )

          Show
          Noble Paul added a comment - Why are files downloaded to a temp directory first? If Solr crashes while downloading that will leave unnecessary/incomplete files in the index directory. We did not want the index directory to be polluted. The files are 'moved' to index directory after they are downloaded . The segments_n file is copied in the end. from temp directory to index directory. (OK , that patch is coming )
          Hide
          Yonik Seeley added a comment -

          Why are files downloaded to a temp directory first? Since all index files are versioned, would it make sense to copy directly into the index dir (provided you copy segments_n last)?

          Show
          Yonik Seeley added a comment - Why are files downloaded to a temp directory first? Since all index files are versioned, would it make sense to copy directly into the index dir (provided you copy segments_n last)?
          Hide
          Akshay K. Ukey added a comment -

          Patch includes changes for SOLR-658 and SOLR-561.
          Changes:
          Abort command implementation in ReplicationHandler to abort ongoing replication.
          More information displayed in the replication admin page, such as file being copied, time elapsed, time remaining, size of file downloaded and so on.

          Show
          Akshay K. Ukey added a comment - Patch includes changes for SOLR-658 and SOLR-561 . Changes: Abort command implementation in ReplicationHandler to abort ongoing replication. More information displayed in the replication admin page, such as file being copied, time elapsed, time remaining, size of file downloaded and so on.
          Hide
          Akshay K. Ukey added a comment -

          Full patch with:
          1. Support for reserving commit point. Configurable with commitReserveDuration configuration in ReplicationHandler section.

          <requestHandler name="/replication" class="solr.ReplicationHandler" >
              <lst name="master">
                  <str name="replicateAfter">commit</str>
                   <str name="confFiles">schema.xml,stopwords.txt,elevate.xml</str>
          		 <str name="commitReserveDuration">01:00:00</str>
              </lst>
          </requestHandler>
          

          2. Admin page for displaying replication details.
          3. Combines changes for SOLR 617 and SOLR 658.

          Show
          Akshay K. Ukey added a comment - Full patch with: 1. Support for reserving commit point. Configurable with commitReserveDuration configuration in ReplicationHandler section. <requestHandler name= "/replication" class= "solr.ReplicationHandler" > <lst name= "master" > <str name= "replicateAfter" > commit </str> <str name= "confFiles" > schema.xml,stopwords.txt,elevate.xml </str> <str name= "commitReserveDuration" > 01:00:00 </str> </lst> </requestHandler> 2. Admin page for displaying replication details. 3. Combines changes for SOLR 617 and SOLR 658.
          Hide
          Akshay K. Ukey added a comment -

          Patch in sync with the trunk, includes changes for SOLR-617 and SOLR-658.

          Show
          Akshay K. Ukey added a comment - Patch in sync with the trunk, includes changes for SOLR-617 and SOLR-658 .
          Hide
          Shalin Shekhar Mangar added a comment -

          This patch contains all the replication related changes as well as changes in Solr itself. It contains the changes in SOLR-617 and SOLR-658.

          Un-tested version. I just compiled all the various patches and brought them up to trunk.

          Show
          Shalin Shekhar Mangar added a comment - This patch contains all the replication related changes as well as changes in Solr itself. It contains the changes in SOLR-617 and SOLR-658 . Un-tested version. I just compiled all the various patches and brought them up to trunk.
          Hide
          Shalin Shekhar Mangar added a comment - - edited

          This patch contains only replication related changes. It depends on SOLR-658 and SOLR-617 and must be applied after the patches in those issues.

          Show
          Shalin Shekhar Mangar added a comment - - edited This patch contains only replication related changes. It depends on SOLR-658 and SOLR-617 and must be applied after the patches in those issues.
          Hide
          Noble Paul added a comment -

          Reading the above makes one think that it is the master that does the actual replication. In fact, the master only creates a snapshot of the index

          The new replication does not create snapshots for replication. The replication is done from/to a live index. Hence the change in name

          Show
          Noble Paul added a comment - Reading the above makes one think that it is the master that does the actual replication. In fact, the master only creates a snapshot of the index The new replication does not create snapshots for replication. The replication is done from/to a live index. Hence the change in name
          Hide
          Otis Gospodnetic added a comment -

          Comment about the solrconfig entry for replication on the master:

          
          <requestHandler name="/replication" class="solr.ReplicationHandler" >
              <lst name="master">
                  <!--Replicate on 'optimize' it can also be  'commit' -->
                  <str name="replicateAfter">commit</str>
                   <str name="confFiles">schema.xml,stopwords.txt,elevate.xml</str>          
              </lst>
          </requestHandler>
          
          

          Reading the above makes one think that it is the master that does the actual replication. In fact, the master only creates a snapshot of the index and other files after either commit or optimize. It is the slaves that copy the snapshots. So while we refer to the whole process as replication, I think the configuration elements' names should reflect the actual actions to ease understanding and avoid confusion.

          Concretely, I think "replicateAfter" should be called "snapshootAfter" or some such.

          +1 for Hoss' suggestion to decouple scheduling from the handler that can replicate/copy on-demand
          +1 for Shalin's suggestion to expose an HTTP interface to enable/disable snapshooting on masters and copying/replication on slaves.

          Show
          Otis Gospodnetic added a comment - Comment about the solrconfig entry for replication on the master : <requestHandler name= "/replication" class= "solr.ReplicationHandler" > <lst name= "master" > <!--Replicate on 'optimize' it can also be 'commit' --> <str name= "replicateAfter" > commit </str> <str name= "confFiles" > schema.xml,stopwords.txt,elevate.xml </str> </lst> </requestHandler> Reading the above makes one think that it is the master that does the actual replication. In fact, the master only creates a snapshot of the index and other files after either commit or optimize. It is the slaves that copy the snapshots. So while we refer to the whole process as replication, I think the configuration elements' names should reflect the actual actions to ease understanding and avoid confusion. Concretely, I think "replicateAfter" should be called "snapshootAfter" or some such. +1 for Hoss' suggestion to decouple scheduling from the handler that can replicate/copy on-demand +1 for Shalin's suggestion to expose an HTTP interface to enable/disable snapshooting on masters and copying/replication on slaves.
          Hide
          Noble Paul added a comment -

          The core reload functionality has to close the old core .

          Show
          Noble Paul added a comment - The core reload functionality has to close the old core .
          Hide
          Noble Paul added a comment -

          Yes , we may need another issue to track it. Directly calling Solr.close() can cause exceptions on in-flight requests

          Show
          Noble Paul added a comment - Yes , we may need another issue to track it. Directly calling Solr.close() can cause exceptions on in-flight requests
          Hide
          Yonik Seeley added a comment -

          I haven't had a chance to check out the latest patch, but it sounds like "SolrCore.close() is done in a refcounted way" is a generic multi-core change that is potentially sticky enough that it deserves it's own JIRA issue.

          Show
          Yonik Seeley added a comment - I haven't had a chance to check out the latest patch, but it sounds like "SolrCore.close() is done in a refcounted way" is a generic multi-core change that is potentially sticky enough that it deserves it's own JIRA issue.
          Hide
          Noble Paul added a comment -

          This patch is to

          • sync with the trunk (SOLR-638 changes)
          • uses java1.4 ScheduledExcecutorService instead of timer
          • SolrCore.close() is done in a refcounted way
          Show
          Noble Paul added a comment - This patch is to sync with the trunk ( SOLR-638 changes) uses java1.4 ScheduledExcecutorService instead of timer SolrCore.close() is done in a refcounted way
          Hide
          Ryan McKinley added a comment -

          I just committed SOLR-638 – I think this patch depends on that

          Show
          Ryan McKinley added a comment - I just committed SOLR-638 – I think this patch depends on that
          Hide
          Noble Paul added a comment -

          new patch that takes care of the refcount. this is a complete patch

          Show
          Noble Paul added a comment - new patch that takes care of the refcount. this is a complete patch
          Hide
          Noble Paul added a comment -

          Just the changes required to the core

          Show
          Noble Paul added a comment - Just the changes required to the core
          Hide
          Noble Paul added a comment -

          Actually there are a handful of other HTTP methods which can be invoked over HTTP. These can be used to control the feature from admin interface

          Show
          Noble Paul added a comment - Actually there are a handful of other HTTP methods which can be invoked over HTTP. These can be used to control the feature from admin interface Abort copying snap from master to slave command : http://slave_host:port/solr/replication?command=abortsnappull Force a snapshot on master command : http://master_host:port/solr/replication?command=snapshoot Force a snap pull on slave from master command : http://slave_host:port/solr/replication?command=snappull Disable polling for snapshot from slave ommand : http://slave_host:port/solr/replication?command=disablepoll Enable polling for snapshot from slave command : http://slave_host:port/solr/replication?command=enablepoll
          Hide
          Shalin Shekhar Mangar added a comment -

          Sure Yonik, we shall separate the changes in core classes into separate issues.

          Show
          Shalin Shekhar Mangar added a comment - Sure Yonik, we shall separate the changes in core classes into separate issues.
          Hide
          Yonik Seeley added a comment -

          Could you guys pull out all the changes to MultiCore, CoreDescriptor, SolrCore, etc (everything not related to replication) into a separate patch. I think that will help things get committed. Ryan also has a need to get the MultiCore and I think perhaps a getMultiCore() should just be added to the CoreDescriptor.

          Show
          Yonik Seeley added a comment - Could you guys pull out all the changes to MultiCore, CoreDescriptor, SolrCore, etc (everything not related to replication) into a separate patch. I think that will help things get committed. Ryan also has a need to get the MultiCore and I think perhaps a getMultiCore() should just be added to the CoreDescriptor.
          Hide
          Shalin Shekhar Mangar added a comment -

          Replication can be disabled by not registering the handler in solrconfig.xml and an HTTP API call should be added to disable replication on master/slave.

          Show
          Shalin Shekhar Mangar added a comment - Replication can be disabled by not registering the handler in solrconfig.xml and an HTTP API call should be added to disable replication on master/slave.
          Hide
          Guillaume Smet added a comment -

          The next patch will take care of it

          Nice.

          Noble Paul@06/Jun/08 09:59 AM> It's easy to implement with a wild card . But , very few files need to be replicated .Isn't it better to explicitly mention the names so that no file accidentally gets replicated.

          I'm thinking of a use case: if you have a lot of synonym/stopwords dictionaries for different languages and field types, it might be a bit awkward to specify each file. A synonyms_*.txt, stopwords_*.txt would be welcome.

          Furthermore, I wonder if we shouldn't disable explicitely the replication of solrconfig.xml. Any opinion?

          Show
          Guillaume Smet added a comment - The next patch will take care of it Nice. Noble Paul@06/Jun/08 09:59 AM> It's easy to implement with a wild card . But , very few files need to be replicated .Isn't it better to explicitly mention the names so that no file accidentally gets replicated. I'm thinking of a use case: if you have a lot of synonym/stopwords dictionaries for different languages and field types, it might be a bit awkward to specify each file. A synonyms_*.txt, stopwords_*.txt would be welcome. Furthermore, I wonder if we shouldn't disable explicitely the replication of solrconfig.xml. Any opinion?
          Hide
          Noble Paul added a comment -

          I usually replicate on optimize only but I wonder if people use the current ability to replicate on commit and on optimize. It doesn't seem to be possible with your current patch.

          technically it is possible just add two entries for replicateAfter. The code is not handling it because NamedList did not have a getAll() method at that time. The next patch will take care of it

          Show
          Noble Paul added a comment - I usually replicate on optimize only but I wonder if people use the current ability to replicate on commit and on optimize. It doesn't seem to be possible with your current patch. technically it is possible just add two entries for replicateAfter . The code is not handling it because NamedList did not have a getAll() method at that time. The next patch will take care of it
          Hide
          Guillaume Smet added a comment -

          I read the patch quickly. I noticed a small typo in SnapPuller.DFAULT_CHUNK_SIZE (should be DEFAULT).

          I like the idea of configuration files replication (yeah, no more scp schema.xml everywhere).

          I usually replicate on optimize only but I wonder if people use the current ability to replicate on commit and on optimize. It doesn't seem to be possible with your current patch.

          Anyway, really nice work.

          Show
          Guillaume Smet added a comment - I read the patch quickly. I noticed a small typo in SnapPuller.DFAULT_CHUNK_SIZE (should be DEFAULT). I like the idea of configuration files replication (yeah, no more scp schema.xml everywhere). I usually replicate on optimize only but I wonder if people use the current ability to replicate on commit and on optimize. It doesn't seem to be possible with your current patch. Anyway, really nice work.
          Hide
          Noble Paul added a comment -

          If someone is replicating an older IndexCommit, then we want to extend the lease on it. In order to do that, we need to know what IndexCommit the client is replicating. The file name is not enough as a single file is normally part of more than one IndexCommit.

          Got the point. I assumed that the SOLR-617 should have enough features to make people configure that.

          It is hard to drive it from the replication handler. The lease can be extended only when we get the onInit() onCommit() callback on SolrIndexDeletionPolicy. We can't reliably expect it to happen during the time of downloading

          Show
          Noble Paul added a comment - If someone is replicating an older IndexCommit, then we want to extend the lease on it. In order to do that, we need to know what IndexCommit the client is replicating. The file name is not enough as a single file is normally part of more than one IndexCommit. Got the point. I assumed that the SOLR-617 should have enough features to make people configure that. It is hard to drive it from the replication handler. The lease can be extended only when we get the onInit() onCommit() callback on SolrIndexDeletionPolicy . We can't reliably expect it to happen during the time of downloading
          Hide
          Yonik Seeley added a comment -

          Because the file names are unique it did not matter if I used the index version (or does it).

          Yes, file names are unique since Lucene doesn't change existing files once they are written. But, if I completely delete an index and start again, the same file name would be reused with different contents (and a different timestamp).

          But that's not the point I was trying to make...
          If someone is replicating an older IndexCommit, then we want to extend the lease on it. In order to do that, we need to know what IndexCommit the client is replicating. The file name is not enough as a single file is normally part of more than one IndexCommit.

          Show
          Yonik Seeley added a comment - Because the file names are unique it did not matter if I used the index version (or does it). Yes, file names are unique since Lucene doesn't change existing files once they are written. But, if I completely delete an index and start again, the same file name would be reused with different contents (and a different timestamp). But that's not the point I was trying to make... If someone is replicating an older IndexCommit, then we want to extend the lease on it. In order to do that, we need to know what IndexCommit the client is replicating. The file name is not enough as a single file is normally part of more than one IndexCommit.
          Hide
          Noble Paul added a comment -

          Good catch . but it is not obvious that the refCount was incremented . Should we not have a method to return the searcher without
          incrementing the refcount ? something like SolrCore#getSearcherNoIncRef()
          Anyone who is not using the IndexSearcher for searching will need that

          Show
          Noble Paul added a comment - Good catch . but it is not obvious that the refCount was incremented . Should we not have a method to return the searcher without incrementing the refcount ? something like SolrCore#getSearcherNoIncRef() Anyone who is not using the IndexSearcher for searching will need that
          Hide
          Yonik Seeley added a comment - - edited

          it doesn't seem like old files are being removed on the slave for me... actually I think this is related to the fact that I don't see old searchers being cleaned up... my slave currently has 4 open - one for each index version.

          OK, I found the bug that caused this one...
          Line SnapPuller.java:172
          core.getSearcher().get()....
          That pattern is almost always a bug... getSearcher() returns a RefCounted object that performs the reference counting on the SolrIndexSearcher. It must be decremented (normally via a finally block).

          Oh... and more internal code comments would be welcome (I don't know if it's practical to add them after the fact... I find myself adding them for my own notes/thoughts as I develop).

          Show
          Yonik Seeley added a comment - - edited it doesn't seem like old files are being removed on the slave for me... actually I think this is related to the fact that I don't see old searchers being cleaned up... my slave currently has 4 open - one for each index version. OK, I found the bug that caused this one... Line SnapPuller.java:172 core.getSearcher().get().... That pattern is almost always a bug... getSearcher() returns a RefCounted object that performs the reference counting on the SolrIndexSearcher. It must be decremented (normally via a finally block). Oh... and more internal code comments would be welcome (I don't know if it's practical to add them after the fact... I find myself adding them for my own notes/thoughts as I develop).
          Hide
          Noble Paul added a comment -

          Thanks.
          I guess Lucene must be cleaning it up because that is what the
          deletion policy says
          Good point. Will incorporate that
          Because the file names are unique it did not matter if I used the
          index version (or does it) . Please clarify
          Yeah . It does that . If all the files are not copied completely it aborts.
          You are right. When the replication process starts , a lock is
          acquired. The lock is released only after the process completes


          --Noble Paul

          Show
          Noble Paul added a comment - Thanks. I guess Lucene must be cleaning it up because that is what the deletion policy says Good point. Will incorporate that Because the file names are unique it did not matter if I used the index version (or does it) . Please clarify Yeah . It does that . If all the files are not copied completely it aborts. You are right. When the replication process starts , a lock is acquired. The lock is released only after the process completes – --Noble Paul
          Hide
          Yonik Seeley added a comment -

          I love how easy this is to set up!

          A couple of issues I noticed while testing:

          • it doesn't seem like old files are being removed on the slave for me... actually I think this is related to the fact that I don't see old searchers being cleaned up... my slave currently has 4 open - one for each index version.
          • segment_* should be copied last... then if we crash in the middle, everything will work fine... lucene will open the previous index version automatically.
          • since a single index file is likely to be part of multiple indicies, commands from the slave to the master to replicate this file should reference the index version being replicated. This allows time-based reservation of a specific index commit point.

          What happens when the slave is replicating an index, and some of the files become missing on the master? Seems like the slave should simply abandon the current replication effort. Next time the master is polled, the new index version will be discovered and the process can start again as normal.

          What happens if replication takes a really long time? I assume that no new replications will be kicked off until the current one has finished?

          Show
          Yonik Seeley added a comment - I love how easy this is to set up! A couple of issues I noticed while testing: it doesn't seem like old files are being removed on the slave for me... actually I think this is related to the fact that I don't see old searchers being cleaned up... my slave currently has 4 open - one for each index version. segment_* should be copied last... then if we crash in the middle, everything will work fine... lucene will open the previous index version automatically. since a single index file is likely to be part of multiple indicies, commands from the slave to the master to replicate this file should reference the index version being replicated. This allows time-based reservation of a specific index commit point. What happens when the slave is replicating an index, and some of the files become missing on the master? Seems like the slave should simply abandon the current replication effort. Next time the master is polled, the new index version will be discovered and the process can start again as normal. What happens if replication takes a really long time? I assume that no new replications will be kicked off until the current one has finished?
          Hide
          Noble Paul added a comment - - edited

          This patch relies on the IndexDeletionPolicy to identify files to be replicated. It also supports replication of conf files. No need to register any listeners/ QueryResponseWriters

          The configuration is as follows
          on master

          solrconfig.xml
          <requestHandler name="/replication" class="solr.ReplicationHandler" >
              <lst name="master">
                  <!--Replicate on 'optimize' it can also be  'commit' -->
                  <str name="replicateAfter">commit</str>
                  <!--Config files to be to be replicated-->
                   <str name="confFiles">schema.xml,stopwords.txt,elevate.xml</str>          
              </lst>
          </requestHandler>
          

          on slave

          solrconfig.xml
          <requestHandler name="/replication" class="solr.ReplicationHandler" >
              <lst name="slave">
                  <str name="masterUrl">http://localhost:port/solr/corename/replication</str>  
                  <str name="pollInterval">00:00:20</str>  
               </lst>
          </requestHandler>
          

          The Replication strategy is changed as follows

          • CMD_INDEX_VERSION: (command=indexversion)gets the version of the current IndexCommit to be replicated from the master. if the version is same, no need to replicate. If it is different
          • CMD_FILE_LIST : (command=filelist)Get the list of file names for the current IndexCommit . Checks with the local index and identifies modified files by comparing names an sizes. It also returns the details of the conf files
          • CMD-FILE_CONTENT : (command=filecontent)For each files to be downloaded, issue this command an download the content to a temp folder. After successful completion copy them to the index folder and isse a commit
          • If the current index is stale, or not able to synchronize, copy all the files . An index.properties file is written, which has the location of the new index directory
          • CoreDescriptor has a new method to reload core.
          • If conf files are modified they are copied to the conf folder after taking a backup of the old. Then the core is reloaded
          Show
          Noble Paul added a comment - - edited This patch relies on the IndexDeletionPolicy to identify files to be replicated. It also supports replication of conf files. No need to register any listeners/ QueryResponseWriters The configuration is as follows on master solrconfig.xml <requestHandler name= "/replication" class= "solr.ReplicationHandler" > <lst name= "master" > <!--Replicate on 'optimize' it can also be 'commit' --> <str name= "replicateAfter" >commit</str> <!--Config files to be to be replicated--> <str name= "confFiles" >schema.xml,stopwords.txt,elevate.xml</str> </lst> </requestHandler> on slave solrconfig.xml <requestHandler name= "/replication" class= "solr.ReplicationHandler" > <lst name= "slave" > <str name= "masterUrl" >http: //localhost:port/solr/corename/replication</str> <str name= "pollInterval" >00:00:20</str> </lst> </requestHandler> The Replication strategy is changed as follows CMD_INDEX_VERSION: (command=indexversion)gets the version of the current IndexCommit to be replicated from the master. if the version is same, no need to replicate. If it is different CMD_FILE_LIST : (command=filelist)Get the list of file names for the current IndexCommit . Checks with the local index and identifies modified files by comparing names an sizes. It also returns the details of the conf files CMD-FILE_CONTENT : (command=filecontent)For each files to be downloaded, issue this command an download the content to a temp folder. After successful completion copy them to the index folder and isse a commit If the current index is stale, or not able to synchronize, copy all the files . An index.properties file is written, which has the location of the new index directory CoreDescriptor has a new method to reload core. If conf files are modified they are copied to the conf folder after taking a backup of the old. Then the core is reloaded
          Hide
          Noble Paul added a comment -

          bq: First we have an active master, some standby masters and search slaves

          This looks like a good approach. In the current design I must allow users to specify multiple 'materUrl' . This must take care of one or more standby masters. It can automatically fallback to another master if one fails.

          On active master, there is a index snapshots manager. Whenever there's an update, it takes a snapshot. On window, it uses copy (I should try fsutil) and on linux it uses hard link..The snapshot manager also clean up old snapshots. From time to time, I still got index corruption when commit update. When that happen, shapshot manager allows us to rollback to previous good snapshot.

          How can I know if the index got corrupted? if I can know it the best way to implement that would be to add a command to ReplicationHandler to rollback to latest .

          On active master, there is a replication server component which listens at a specific port

          plain socket communication is more work than relying over the simple http protocol .The little extra efficiency you may achieve may not justify that (http is not too solw either). In this case the servlet container provides you with sockets , threads etc etc. Take a look at the patch on how efficiently is it done in the current patch.

          client creates a tmp directory and hard link everything from its local index directory, then for each file in the file list, if it does not exit locally, get new file from server; if it is newer than local one, ask server for update like rsync; if local files do not exist in file list, delete them. in the case of compound file is used for index, the file update will update only diff blocks.

          The current implementation is more or less like what you have done. For a compound file I am not sure if a diff based sync can be more efficient. Because it is hard to get the similar blocks in the file. I rely on checksums of whole file. If there is an efficient mechanism to obtain identical blocks, share the code I can incorporate that
          The hardlink approach may be not necessary now as I made the SolrCore not to hardcode the index folder.

          Show
          Noble Paul added a comment - bq: First we have an active master, some standby masters and search slaves This looks like a good approach. In the current design I must allow users to specify multiple 'materUrl' . This must take care of one or more standby masters. It can automatically fallback to another master if one fails. On active master, there is a index snapshots manager. Whenever there's an update, it takes a snapshot. On window, it uses copy (I should try fsutil) and on linux it uses hard link..The snapshot manager also clean up old snapshots. From time to time, I still got index corruption when commit update. When that happen, shapshot manager allows us to rollback to previous good snapshot. How can I know if the index got corrupted? if I can know it the best way to implement that would be to add a command to ReplicationHandler to rollback to latest . On active master, there is a replication server component which listens at a specific port plain socket communication is more work than relying over the simple http protocol .The little extra efficiency you may achieve may not justify that (http is not too solw either). In this case the servlet container provides you with sockets , threads etc etc. Take a look at the patch on how efficiently is it done in the current patch. client creates a tmp directory and hard link everything from its local index directory, then for each file in the file list, if it does not exit locally, get new file from server; if it is newer than local one, ask server for update like rsync; if local files do not exist in file list, delete them. in the case of compound file is used for index, the file update will update only diff blocks. The current implementation is more or less like what you have done. For a compound file I am not sure if a diff based sync can be more efficient. Because it is hard to get the similar blocks in the file. I rely on checksums of whole file. If there is an efficient mechanism to obtain identical blocks, share the code I can incorporate that The hardlink approach may be not necessary now as I made the SolrCore not to hardcode the index folder.
          Hide
          Let me out added a comment -

          I'm using Solr to build a search service for my company. From operation or maybe performance point view, we need to use java to replicate index.

          From very high level, my design is similar to what Noble mentioned here. It is like this:

          1) First we have an active master, some standby masters and search slaves. The active master handles crawling data and update index; standby masters are redundant to active master. If active master goes away, one of the standby will become active. Standby masters replicate index from active master to act as backup; search slaves only replicate index from active master.

          2) On active master, there is a index snapshots manager. Whenever there's an update, it takes a snapshot. On window, it uses copy (I should try fsutil) and on linux it uses hard link..The snapshot manager also clean up old snapshots. From time to time, I still got index corruption when commit update. When that happen, shapshot manager allows us to rollback to previous good snapshot.

          3) On active master, there is a replication server component which listens at a specific port (The reason I did not use http port is I do not use solr as it is. I embed solr in our application server, so go through http would be not very efficient for us). Each standby and slave has replication client component. The following is the protocol between the replication client and server:
          a) client ping the a directory server for the location of active master
          b) connect to the active master at the specific port
          c) handshake: right now just check for version and authentication. in the future, it will negotiate security, compression, etc.
          d) client sends SNAPSHOT_OPEN command followed by index name. The master could manage multiple indexes. Server sends index_not_found if index does not exist or ok followed by snapshot name of the latest snapshot;
          e) if the index is found, client compares the timestamp with that of local snapshot. The timestamp of snapshot is derived from snapshot name because part of snapshot name is encoded timestamps. If local is newer, tell the server to close the snapshot; otherwise, ask server for a list of files in the snapshot. If ok, server sends ok op, followed by a file list including filename, timestamp, etc.
          f) client creates a tmp directory and hard link everything from its local index directory, then for each file in the file list, if it does not exit locally, get new file from server; if it is newer than local one, ask server for update like rsync; if local files do not exist in file list, delete them. in the case of compound file is used for index, the file update will update only diff blocks.
          g) if everything goes well, tell server to close the snapshot, rename the tmp directory to a proper place, create solr-core using this new index, warmup any cache if necessary, route new request to this solr-core, close old solr-core, remove old index directory.

          Right now a client replicates index from active master every 3 mins. for a slow change datasource. It works fine because create new solr-core and warmup cache take less than 3 mins. We plan to use it for a fast changing datasource, so create new solr-core and dump all the cache is not feasible. Any suggestion?

          Show
          Let me out added a comment - I'm using Solr to build a search service for my company. From operation or maybe performance point view, we need to use java to replicate index. From very high level, my design is similar to what Noble mentioned here. It is like this: 1) First we have an active master, some standby masters and search slaves. The active master handles crawling data and update index; standby masters are redundant to active master. If active master goes away, one of the standby will become active. Standby masters replicate index from active master to act as backup; search slaves only replicate index from active master. 2) On active master, there is a index snapshots manager. Whenever there's an update, it takes a snapshot. On window, it uses copy (I should try fsutil) and on linux it uses hard link..The snapshot manager also clean up old snapshots. From time to time, I still got index corruption when commit update. When that happen, shapshot manager allows us to rollback to previous good snapshot. 3) On active master, there is a replication server component which listens at a specific port (The reason I did not use http port is I do not use solr as it is. I embed solr in our application server, so go through http would be not very efficient for us). Each standby and slave has replication client component. The following is the protocol between the replication client and server: a) client ping the a directory server for the location of active master b) connect to the active master at the specific port c) handshake: right now just check for version and authentication. in the future, it will negotiate security, compression, etc. d) client sends SNAPSHOT_OPEN command followed by index name. The master could manage multiple indexes. Server sends index_not_found if index does not exist or ok followed by snapshot name of the latest snapshot; e) if the index is found, client compares the timestamp with that of local snapshot. The timestamp of snapshot is derived from snapshot name because part of snapshot name is encoded timestamps. If local is newer, tell the server to close the snapshot; otherwise, ask server for a list of files in the snapshot. If ok, server sends ok op, followed by a file list including filename, timestamp, etc. f) client creates a tmp directory and hard link everything from its local index directory, then for each file in the file list, if it does not exit locally, get new file from server; if it is newer than local one, ask server for update like rsync; if local files do not exist in file list, delete them. in the case of compound file is used for index, the file update will update only diff blocks. g) if everything goes well, tell server to close the snapshot, rename the tmp directory to a proper place, create solr-core using this new index, warmup any cache if necessary, route new request to this solr-core, close old solr-core, remove old index directory. Right now a client replicates index from active master every 3 mins. for a slow change datasource. It works fine because create new solr-core and warmup cache take less than 3 mins. We plan to use it for a fast changing datasource, so create new solr-core and dump all the cache is not feasible. Any suggestion?
          Hide
          Noble Paul added a comment -

          This can be used for a very optimized index copy. I shall incorporate this also in the next patch. A few points stand out

          • Is it relevant to use this in a postOptimize? . I guess not
          • Taking snapshots can serve as a backup . If we adopt this strategy only users will lose that feature of the existing mechanism.
          • Let us make it configurable for user to choose which strategy he prefers . say <bool name="snapshoot">true</bool> .
          Show
          Noble Paul added a comment - This can be used for a very optimized index copy. I shall incorporate this also in the next patch. A few points stand out Is it relevant to use this in a postOptimize ? . I guess not Taking snapshots can serve as a backup . If we adopt this strategy only users will lose that feature of the existing mechanism. Let us make it configurable for user to choose which strategy he prefers . say <bool name="snapshoot">true</bool> .
          Hide
          Yonik Seeley added a comment -

          Attaching deletion_policy.patch
          This exports a SolrDeletionPolicy via UpdateHandler.getDeletionPolicy()

          It can be used to get the latest SolrIndexCommit, which lists the files that are part of the commit, and can be used to reserve/lease the commit point for a certain amount of time. This could be used to enable replication directly out of the index directory and avoid copying on systems like Windows.

          Each SolrIndexCommit has an id, which can be used by a client as a correlation id. Since a single file can be part of multiple commit points, a replication client should specify what commit point it is copying. The server can then look up that commit point and extend the lease.

          Show
          Yonik Seeley added a comment - Attaching deletion_policy.patch This exports a SolrDeletionPolicy via UpdateHandler.getDeletionPolicy() It can be used to get the latest SolrIndexCommit, which lists the files that are part of the commit, and can be used to reserve/lease the commit point for a certain amount of time. This could be used to enable replication directly out of the index directory and avoid copying on systems like Windows. Each SolrIndexCommit has an id, which can be used by a client as a correlation id. Since a single file can be part of multiple commit points, a replication client should specify what commit point it is copying. The server can then look up that commit point and extend the lease.
          Hide
          Noble Paul added a comment -

          Does it make sense to support regular expressions in that confFiles file list?

          It;s easy to implement with a wild card . But , very few files need to be replicated .Isn't it better to explicitly mention the names so that no file accidentally gets replicated.

          I like the backup idea. As a matter of fact, I'd make backups with timestamps

          Yes, timestamps, the same format used by the snapshots

          Backups > N days old can be periodically deleted

          This is a good idea. It must be a feature of replication. Old conf files as well as indexes should be purged periodically

          Show
          Noble Paul added a comment - Does it make sense to support regular expressions in that confFiles file list? It;s easy to implement with a wild card . But , very few files need to be replicated .Isn't it better to explicitly mention the names so that no file accidentally gets replicated. I like the backup idea. As a matter of fact, I'd make backups with timestamps Yes, timestamps, the same format used by the snapshots Backups > N days old can be periodically deleted This is a good idea. It must be a feature of replication. Old conf files as well as indexes should be purged periodically
          Hide
          Otis Gospodnetic added a comment -

          I think so. You already started doing that with your comment from 04/Jun.
          2 quick thoughts:

          • Does it make sense to support regular expressions in that confFiles file list? Something a la FileFilter - http://java.sun.com/javase/6/docs/api/java/io/FileFilter.html
          • I like the backup idea. As a matter of fact, I'd make backups with timestamps, so we don't have just one backup, but have a bit of history. Backups > N days old can be periodically deleted either as part of this java-based replication mechanism, or via a cron job
          Show
          Otis Gospodnetic added a comment - I think so. You already started doing that with your comment from 04/Jun. 2 quick thoughts: Does it make sense to support regular expressions in that confFiles file list? Something a la FileFilter - http://java.sun.com/javase/6/docs/api/java/io/FileFilter.html I like the backup idea. As a matter of fact, I'd make backups with timestamps, so we don't have just one backup, but have a bit of history. Backups > N days old can be periodically deleted either as part of this java-based replication mechanism, or via a cron job
          Hide
          Noble Paul added a comment -

          can we make SOLR-551 a subproject of this?

          Show
          Noble Paul added a comment - can we make SOLR-551 a subproject of this?
          Hide
          Noble Paul added a comment - - edited

          The next step is to replicate files in conf folder.
          Thre strategy is as follows,

          • Mention the files to be replicated from the master, in the ReplicationHandler
              <requestHandler name="/replication" class="solr.ReplicationHandler" > 
                  <str name="confFiles">schema.xml,stopwords.txt,elevate.xml</str>  
                </requestHandler>  
            
          • For the CMD_FILE_LIST command response include these files also
          • The slave can compare the files with its local copy and if it is modified download them
          • A backup of the current files are taken and the new files are placed into the conf folder
          • If a conf file is changed the the SolrCore must be reloaded
          • There must be separate strategies for reloading core for single core or multicore
          Show
          Noble Paul added a comment - - edited The next step is to replicate files in conf folder. Thre strategy is as follows, Mention the files to be replicated from the master, in the ReplicationHandler <requestHandler name= "/replication" class= "solr.ReplicationHandler" > <str name= "confFiles" > schema.xml,stopwords.txt,elevate.xml </str> </requestHandler> For the CMD_FILE_LIST command response include these files also The slave can compare the files with its local copy and if it is modified download them A backup of the current files are taken and the new files are placed into the conf folder If a conf file is changed the the SolrCore must be reloaded There must be separate strategies for reloading core for single core or multicore
          Hide
          Andrew Savory added a comment -

          I'd certainly like to see this in 1.3, it would make my life easier!
          I'm trying out the code now and hope to feedback in depth soon.
          Meanwhile some initial comments: there's inconsistency between 4 space and 2 space tabs in the code, and a few System.out.println that you probably want to remove or replace with proper logging.

          Show
          Andrew Savory added a comment - I'd certainly like to see this in 1.3, it would make my life easier! I'm trying out the code now and hope to feedback in depth soon. Meanwhile some initial comments: there's inconsistency between 4 space and 2 space tabs in the code, and a few System.out.println that you probably want to remove or replace with proper logging.
          Hide
          Noble Paul added a comment -

          Should we plan this feature for Solr 1.3 release ?. If yes, what all are the items pending to be completed?

          Show
          Noble Paul added a comment - Should we plan this feature for Solr 1.3 release ?. If yes, what all are the items pending to be completed?
          Hide
          Noble Paul added a comment -

          A generic Scheduling system could be hooked into SolrCore that can hit arbitrary RequestHandlers according to whatever configuration.

          hoss: currently the timer task itself is a part of SnapPuller.java . I endorse your idea of having a scheduling feature built into SolrCore if it is useful to more than one components. As you mentioned every operation is triggerred by the ReplicationHandler's REST API. So if another servive can give a callback at the right time it is the best solution.

          the challenge here is (efficient) pure java equivalents of snapshooter/snappuller/snapinstaller

          Yes , this indeed is the challenge. I wish people to look into the implementation and comment on how these operations can be made more efficient. I already am thinking of caching the file checksums because there are more than one slaves requesting for the same.

          The other important item that needs review is the changes made to SolrCore.getNewIndexDir()

          Show
          Noble Paul added a comment - A generic Scheduling system could be hooked into SolrCore that can hit arbitrary RequestHandlers according to whatever configuration. hoss: currently the timer task itself is a part of SnapPuller.java . I endorse your idea of having a scheduling feature built into SolrCore if it is useful to more than one components. As you mentioned every operation is triggerred by the ReplicationHandler 's REST API. So if another servive can give a callback at the right time it is the best solution. the challenge here is (efficient) pure java equivalents of snapshooter/snappuller/snapinstaller Yes , this indeed is the challenge. I wish people to look into the implementation and comment on how these operations can be made more efficient. I already am thinking of caching the file checksums because there are more than one slaves requesting for the same. The other important item that needs review is the changes made to SolrCore.getNewIndexDir()
          Hide
          Hoss Man added a comment -

          If scheduling is indeed important we will take it up.

          Meanwhile we need to ensure that the solution is usable and bug free.

          agreed ... the challenge here is (efficient) pure java equivalents of snapshooter/snappuller/snapinstaller ... the scheduling mechanism is largely orthogonal, particularly since Paul is using a "ReplicationHandler" as the main API. it could easily be dealt with later (or in parallel if anyone wants to take on the task)

          i don't think the ReplicationHandler should know anything about scheduling or recurance. A generic Scheduling system could be hooked into SolrCore that can hit arbitrary RequestHandlers according to whatever configuration it has (similar to the QuerySendEventListener) which would handle this case, as well as other interesting use cases (ie: rebuild a spelling dictionary using an external datasource every hour, even if the index hasn't changed)

          the scheduling aspect can easily be dealt with later (or in parallel if anyone wants to take on the task)

          Show
          Hoss Man added a comment - If scheduling is indeed important we will take it up. Meanwhile we need to ensure that the solution is usable and bug free. agreed ... the challenge here is (efficient) pure java equivalents of snapshooter/snappuller/snapinstaller ... the scheduling mechanism is largely orthogonal, particularly since Paul is using a "ReplicationHandler" as the main API. it could easily be dealt with later (or in parallel if anyone wants to take on the task) i don't think the ReplicationHandler should know anything about scheduling or recurance. A generic Scheduling system could be hooked into SolrCore that can hit arbitrary RequestHandlers according to whatever configuration it has (similar to the QuerySendEventListener) which would handle this case, as well as other interesting use cases (ie: rebuild a spelling dictionary using an external datasource every hour, even if the index hasn't changed) the scheduling aspect can easily be dealt with later (or in parallel if anyone wants to take on the task)
          Hide
          Noble Paul added a comment - - edited

          This feature is far from complete . Enhanced admin features is probably the next priority
          If scheduling is indeed important we will take it up.
          Meanwhile we need to ensure that the solution is usable and bug free.

          Show
          Noble Paul added a comment - - edited This feature is far from complete . Enhanced admin features is probably the next priority If scheduling is indeed important we will take it up. Meanwhile we need to ensure that the solution is usable and bug free.
          Hide
          Andrew Savory added a comment -

          Please don't ignore Thomas's repeat suggestion to use Quartz!

          Having replication built-in but then having to use an external cron job to trigger the operations seems suboptimal to me. Being able to configure everything related to replication within the solr deployment seems far more elegant.

          Show
          Andrew Savory added a comment - Please don't ignore Thomas's repeat suggestion to use Quartz! Having replication built-in but then having to use an external cron job to trigger the operations seems suboptimal to me. Being able to configure everything related to replication within the solr deployment seems far more elegant.
          Hide
          Noble Paul added a comment -

          This patch includes

          • a new method in ReplicationHandler filechecksum , which can return the checksums of a given list of files in a snapshot
          • The snappuller will request for the checksums of the files if the name and size are same (compared to the current index)
          • Only if the checksums are different the file is downloaded
          • Other files are copied from the current index
          Show
          Noble Paul added a comment - This patch includes a new method in ReplicationHandler filechecksum , which can return the checksums of a given list of files in a snapshot The snappuller will request for the checksums of the files if the name and size are same (compared to the current index) Only if the checksums are different the file is downloaded Other files are copied from the current index
          Hide
          Thomas Peuss added a comment -

          Thomas: This is something that can be considered. But , I am still not very convinced that people use it that way. It is going to introduce some dependency on a new library. Let us see if there is enough demand from users

          The basic functionality is much more important of course (and much harder to do).

          note :All the operations can be triggerred using http get. So a wget from cron can do the trick in the current form

          OK. Then ignore my comment.

          Show
          Thomas Peuss added a comment - Thomas: This is something that can be considered. But , I am still not very convinced that people use it that way. It is going to introduce some dependency on a new library. Let us see if there is enough demand from users The basic functionality is much more important of course (and much harder to do). note :All the operations can be triggerred using http get. So a wget from cron can do the trick in the current form OK. Then ignore my comment.
          Hide
          Noble Paul added a comment -

          Thomas: This is something that can be considered. But , I am still not very convinced that people use it that way. It is going to introduce some dependency on a new library. Let us see if there is enough demand from users

          note :All the operations can be triggerred using http get. So a wget from cron can do the trick in the current form

          Show
          Noble Paul added a comment - Thomas: This is something that can be considered. But , I am still not very convinced that people use it that way. It is going to introduce some dependency on a new library. Let us see if there is enough demand from users note :All the operations can be triggerred using http get. So a wget from cron can do the trick in the current form
          Hide
          Thomas Peuss added a comment -

          A library like "Quartz" (http://www.opensymphony.com/quartz/) would give you the possibility to provide both types of schedules (HH:MM:SS and cron like). Quartz uses the ASF 2.0 license. So there are at least no licensing issues.

          I have not looked at the code but it is possible to do snapshots/snappull at a certain time (e.g. every day 1am)? Quartz would give you the possibility to do that as well. Quartz would even provide scenarios like every 1st monday of the month.

          Show
          Thomas Peuss added a comment - A library like "Quartz" ( http://www.opensymphony.com/quartz/ ) would give you the possibility to provide both types of schedules (HH:MM:SS and cron like). Quartz uses the ASF 2.0 license. So there are at least no licensing issues. I have not looked at the code but it is possible to do snapshots/snappull at a certain time (e.g. every day 1am)? Quartz would give you the possibility to do that as well. Quartz would even provide scenarios like every 1st monday of the month .
          Hide
          Noble Paul added a comment -

          The first cut. very crude , but worx .No OS specific commands
          The design for snapshoot , snappull is same as described in the design overview
          snapinstall is done the following way

          • write out a file index.properties in dataDirectory.
          • call commit command
          • The SolrCore is modified. A new method getNewIndexDir() is added. It loads the properties file (if exists, else fall back to the old behavior), read a property 'index' and return
          • Any new SolrIndexWriter, SolrIndexSearcher is loaded from the new index dir
          • Old ones continue to use the old index dir . getindexDir() returns the index dir used by the current SolrIndexReader/SolrIndexWriter
            Use the following configuration
            in master
          1. register snapshooter
              <listener event="postCommit" class="solr.SnapShooter">    
                      <bool name="wait">true</bool>
                </listener>
            
          2. register replication Handler
             <requestHandler name="/replication" class="solr.ReplicationHandler" />
            
          3. register a new ResponseWriter
            <queryResponseWriter name="filestream" class="org.apache.solr.request.CustomBinaryResponseWriter"/>
            

          In the Slave

          1. register the replication handler
              <requestHandler name="/replication" class="solr.ReplicationHandler" > 
                  <str name="masterUrl">http://localhost:8080/solr/replication</str>
                <str name="pollInterval">00:00:30</str>
                </requestHandler>  
            
          Show
          Noble Paul added a comment - The first cut. very crude , but worx .No OS specific commands The design for snapshoot , snappull is same as described in the design overview snapinstall is done the following way write out a file index.properties in dataDirectory. call commit command The SolrCore is modified. A new method getNewIndexDir() is added. It loads the properties file (if exists, else fall back to the old behavior), read a property 'index' and return Any new SolrIndexWriter, SolrIndexSearcher is loaded from the new index dir Old ones continue to use the old index dir . getindexDir() returns the index dir used by the current SolrIndexReader/SolrIndexWriter Use the following configuration in master register snapshooter <listener event= "postCommit" class= "solr.SnapShooter" > <bool name= "wait" > true </bool> </listener> register replication Handler <requestHandler name= "/replication" class= "solr.ReplicationHandler" /> register a new ResponseWriter <queryResponseWriter name= "filestream" class= "org.apache.solr.request.CustomBinaryResponseWriter" /> In the Slave register the replication handler <requestHandler name= "/replication" class= "solr.ReplicationHandler" > <str name= "masterUrl" > http://localhost:8080/solr/replication </str> <str name= "pollInterval" > 00:00:30 </str> </requestHandler>
          Hide
          Noble Paul added a comment -

          Yonik: This would be very useful in optimizing the file transfers .We must incorporate this if possible

          BTW . What do you recommend for windows Index deletion?. Is the solution proposed by me fine?

          Show
          Noble Paul added a comment - Yonik: This would be very useful in optimizing the file transfers .We must incorporate this if possible BTW . What do you recommend for windows Index deletion?. Is the solution proposed by me fine?
          Hide
          Yonik Seeley added a comment -

          Just checked: Lucene's IndexCommit.getFileNames() returns all the files associated with a particular commit point.

          Show
          Yonik Seeley added a comment - Just checked: Lucene's IndexCommit.getFileNames() returns all the files associated with a particular commit point.
          Hide
          Yonik Seeley added a comment -

          Is there a reason why IndexDeletionPolicy is not being used?

          Right, that's what I suggested in the initial email thread.
          With a little more smarts, it seems like the new files could be replicated directly from the master index directory, directly to the slave index directory. One wouldn't want to copy all the new files though... only those files that are part of the latest index... (hmmm, does Lucene have a way of getting that info?)

          Show
          Yonik Seeley added a comment - Is there a reason why IndexDeletionPolicy is not being used? Right, that's what I suggested in the initial email thread. With a little more smarts, it seems like the new files could be replicated directly from the master index directory, directly to the slave index directory. One wouldn't want to copy all the new files though... only those files that are part of the latest index... (hmmm, does Lucene have a way of getting that info?)
          Hide
          Noble Paul added a comment -

          The strategy of keeping the index directory name hard coded is a bit tricky. We need to do a lot of File System specific jugglery. The best strategy would be.

          • keep a file index.properties in the data dir
          • Have an entry currentindex=<new.index> in that file
          • This file may keep other extra information if we need it
          • When a new indexsearcher/writer is loaded, read this property and try to load the index from that folder
          • if it is absent , default to the hardcoded value

          This way we never need to make hardlinks etc .

          Show
          Noble Paul added a comment - The strategy of keeping the index directory name hard coded is a bit tricky. We need to do a lot of File System specific jugglery. The best strategy would be. keep a file index.properties in the data dir Have an entry currentindex=<new.index> in that file This file may keep other extra information if we need it When a new indexsearcher/writer is loaded, read this property and try to load the index from that folder if it is absent , default to the hardcoded value This way we never need to make hardlinks etc .
          Hide
          Jason Rutherglen added a comment -

          Is there a reason why IndexDeletionPolicy is not being used? It allows keeping snapshot files available for replication without creating a specific snapshot directory. This would be cleaner than creating an external process.

          Show
          Jason Rutherglen added a comment - Is there a reason why IndexDeletionPolicy is not being used? It allows keeping snapshot files available for replication without creating a specific snapshot directory. This would be cleaner than creating an external process.
          Hide
          Noble Paul added a comment - - edited

          windows has symlinks/hardlink. "fsutil create hardlink " is the command. It woks well as long as your windows version>win2K

          Show
          Noble Paul added a comment - - edited windows has symlinks/hardlink. "fsutil create hardlink " is the command. It woks well as long as your windows version>win2K
          Hide
          Shalin Shekhar Mangar added a comment -

          Re poll interval: I think the HH:MM:ss is enough. Does that allow polling, say, every 72 hours? Just use 72:00:00, right?

          Correct, 72:00:00 will work.

          Re Winblows problem: I'd like the switch to the current/latest snapshot, but this prevents us from always knowing the location of the active directory. We'd have to rely on sorting the dir with snapshot names and assuming the currently active index is the one with the most recent snapshot, no? Symlinks would be great here, but again, Winblows doesn't have them (and I think using shortcuts for this wouldn't work).

          As Noble suggested, once the new searcher is in use and the older one is closed, hopefully windows will kindly grant us permission to delete the files in the index directory. We can then create links to the files in the snapshot being used into the index directory. The latest snapshot directory will be the active one but we'll know what index is being used through the links in the index folder.

          Show
          Shalin Shekhar Mangar added a comment - Re poll interval: I think the HH:MM:ss is enough. Does that allow polling, say, every 72 hours? Just use 72:00:00, right? Correct, 72:00:00 will work. Re Winblows problem: I'd like the switch to the current/latest snapshot, but this prevents us from always knowing the location of the active directory. We'd have to rely on sorting the dir with snapshot names and assuming the currently active index is the one with the most recent snapshot, no? Symlinks would be great here, but again, Winblows doesn't have them (and I think using shortcuts for this wouldn't work). As Noble suggested, once the new searcher is in use and the older one is closed, hopefully windows will kindly grant us permission to delete the files in the index directory. We can then create links to the files in the snapshot being used into the index directory. The latest snapshot directory will be the active one but we'll know what index is being used through the links in the index folder.
          Hide
          Otis Gospodnetic added a comment -

          Re poll interval: I think the HH:MM:ss is enough. Does that allow polling, say, every 72 hours? Just use 72:00:00, right?

          Re Winblows problem: I'd like the switch to the current/latest snapshot, but this prevents us from always knowing the location of the active directory. We'd have to rely on sorting the dir with snapshot names and assuming the currently active index is the one with the most recent snapshot, no? Symlinks would be great here, but again, Winblows doesn't have them (and I think using shortcuts for this wouldn't work).

          Show
          Otis Gospodnetic added a comment - Re poll interval: I think the HH:MM:ss is enough. Does that allow polling, say, every 72 hours? Just use 72:00:00, right? Re Winblows problem: I'd like the switch to the current/latest snapshot, but this prevents us from always knowing the location of the active directory. We'd have to rely on sorting the dir with snapshot names and assuming the currently active index is the one with the most recent snapshot, no? Symlinks would be great here, but again, Winblows doesn't have them (and I think using shortcuts for this wouldn't work).
          Hide
          Noble Paul added a comment -

          A possible solution to the windows replication problem would be.

          • Make changes to Solr Core to load the index from a given directory instead of hardcoding the directory name.In our case we can give the new snapshot directory
          • After the new IndexSearcher/writer is loaded, Close the original index searcher . Then it is ok to delete that
          • Delete old contents and Copy hardlinks to the index directory. So if you restart solr it will get the index from the right place
          Show
          Noble Paul added a comment - A possible solution to the windows replication problem would be. Make changes to Solr Core to load the index from a given directory instead of hardcoding the directory name.In our case we can give the new snapshot directory After the new IndexSearcher/writer is loaded, Close the original index searcher . Then it is ok to delete that Delete old contents and Copy hardlinks to the index directory. So if you restart solr it will get the index from the right place
          Hide
          Noble Paul added a comment -

          for polling a simple interval this syntax may be enough. Polling is not a very expensive operation. It just sends a request and get the latest snapshotname. So we can schedule it to run even every minute also

          If there is a need for such complex scheduling we can consider that syntax.

          Show
          Noble Paul added a comment - for polling a simple interval this syntax may be enough. Polling is not a very expensive operation. It just sends a request and get the latest snapshotname. So we can schedule it to run even every minute also If there is a need for such complex scheduling we can consider that syntax.
          Hide
          Shalin Shekhar Mangar added a comment -

          Does anybody really do things like "first tuesday of each month" for polling the Solr master? The slave's poll is usually set to run every few minutes. Atleast that's how we use it in our production environments. Quartz is nice but the thing is that we don't need all those features. A timer task is good enough for our needs.

          Yes, hh:MM:ss represents time but it isn't difficult to view it as a countdown timer. It's definitely easier to understand than specifying number of seconds/minutes as an integer for a poll interval. What do you think?

          Show
          Shalin Shekhar Mangar added a comment - Does anybody really do things like "first tuesday of each month" for polling the Solr master? The slave's poll is usually set to run every few minutes. Atleast that's how we use it in our production environments. Quartz is nice but the thing is that we don't need all those features. A timer task is good enough for our needs. Yes, hh:MM:ss represents time but it isn't difficult to view it as a countdown timer. It's definitely easier to understand than specifying number of seconds/minutes as an integer for a poll interval. What do you think?
          Hide
          Andrew Savory added a comment -

          Shalin,

          I'm assuming that pollInterVal is intended to specify the frequency of replication. hh:MM:ss is universally recognizable for specifying a single time, certainly. But how do you represent "every hour", or "four times a day", or "the first tuesday of each month" with that notation?

          Certainly Windows doesn't have cron but we're talking about a pure java implementation, so that's not a problem. Quartz might be a perfect solution for scheduling: http://www.opensymphony.com/quartz/

          Show
          Andrew Savory added a comment - Shalin, I'm assuming that pollInterVal is intended to specify the frequency of replication. hh:MM:ss is universally recognizable for specifying a single time, certainly. But how do you represent "every hour", or "four times a day", or "the first tuesday of each month" with that notation? Certainly Windows doesn't have cron but we're talking about a pure java implementation, so that's not a problem. Quartz might be a perfect solution for scheduling: http://www.opensymphony.com/quartz/
          Hide
          Shalin Shekhar Mangar added a comment -

          I think hh:MM:ss is universally recognizable and very intuitive. We should also keep in mind that this solution will be used on a multiple platforms and an OS like Windows does not have cron so it's administrators may not be familiar with the cron format.

          Show
          Shalin Shekhar Mangar added a comment - I think hh:MM:ss is universally recognizable and very intuitive. We should also keep in mind that this solution will be used on a multiple platforms and an OS like Windows does not have cron so it's administrators may not be familiar with the cron format.
          Hide
          Andrew Savory added a comment -

          This looks like an extremely useful addition. More comments when the patch is available, but an initial observation:
          <str name="pollInterVal">HH:MM:SS</str>
          For consistency, could this be specified cron-style instead? e.g.
          <str name="pollInterVal">*/30 * * * *</str>

          Show
          Andrew Savory added a comment - This looks like an extremely useful addition. More comments when the patch is available, but an initial observation: <str name="pollInterVal">HH:MM:SS</str> For consistency, could this be specified cron-style instead? e.g. <str name="pollInterVal">*/30 * * * *</str>
          Hide
          Noble Paul added a comment -

          There are problems with index replacement in windows.
          Windows does not allow as to delete the index folder, because it is being used .
          How do we solve this?

          Show
          Noble Paul added a comment - There are problems with index replacement in windows. Windows does not allow as to delete the index folder, because it is being used . How do we solve this?
          Hide
          Noble Paul added a comment -

          Otis: All the points you have enumerated are valid . We actually think they should be there in the final solution.

          • It should have a way to prevent infinite loops.
          • snap=<snapshotname> is the correct command

          All the admin related changes are planned exactly as you have asked . But we can leave the hooks open and push through with the basic stuff. The design documentation just tries to cover everything which the scripts currently cover.
          If everything else is fine we shall give a rough patch for your review in another 2-3 days

          Show
          Noble Paul added a comment - Otis: All the points you have enumerated are valid . We actually think they should be there in the final solution. It should have a way to prevent infinite loops. snap=<snapshotname> is the correct command All the admin related changes are planned exactly as you have asked . But we can leave the hooks open and push through with the basic stuff. The design documentation just tries to cover everything which the scripts currently cover. If everything else is fine we shall give a rough patch for your review in another 2-3 days
          Hide
          Otis Gospodnetic added a comment -

          I think the above sounds more or less right (read it quickly).

          • Should there exist a mechanism for preventing infinite loops (try to get a file, fail for some reason, try again, and over and over and over until some disk gets filled over night, for example)?
          • I see &snapshhot=<snapshotname> as well as &snap=<snapshotname>. This may be a typo in the JIRA comment only, I don't know.

          Thinking about the Admin display of replication information:

          • Is there anything that keeps track of overall data transfer progress?
            • the name of the snapshot being replicated currently
            • the name of the file being replicated currently
            • the total number of bytes transfered vs. the size of the snapshot
            • any failures (number of failures + info)
              ...
              ...

          I imagine those wanting Enterprise Solr will desire this type of stuff, so even if we don't have any of this in the UI at this point, it might be good keeping this in mind and providing the necessary hooks, callbacks, etc.

          Show
          Otis Gospodnetic added a comment - I think the above sounds more or less right (read it quickly). Should there exist a mechanism for preventing infinite loops (try to get a file, fail for some reason, try again, and over and over and over until some disk gets filled over night, for example)? I see &snapshhot=<snapshotname> as well as &snap=<snapshotname>. This may be a typo in the JIRA comment only, I don't know. Thinking about the Admin display of replication information: Is there anything that keeps track of overall data transfer progress? the name of the snapshot being replicated currently the name of the file being replicated currently the total number of bytes transfered vs. the size of the snapshot any failures (number of failures + info) ... ... I imagine those wanting Enterprise Solr will desire this type of stuff, so even if we don't have any of this in the UI at this point, it might be good keeping this in mind and providing the necessary hooks, callbacks, etc.
          Hide
          Noble Paul added a comment - - edited

          We shall post a patch in the next few days

          The design is as follows:

          • SnapShooter.java : registered as a listener on postCommit/postOptimize . It makes a copy of the latest index to a new snapshot folder (same as it is today). Only in master. It can optionally take in a 'snapDir' as configuration if the snapshot is to be created ina folder other than the data directory.
          • ReplicationHandler: A requesthandler. This is registered in master & slave. It takes in the following config in the slave. Master node just needs an empty requesthandler registration.
            solrconfig.xml
             
            <requestHandler name="replication" class="solr.ReplicationHandler">
                <str name="masterUrl">http://<host>:<port>/solr/<corename>/replication</str>
                <str name="pollInterVal">HH:MM:SS</str>
            </requestHandler>
            

          ReplicationHandler Implements the following methods. Every method is invoked over http GET. These methods are usually trigerred from the slave (over http) or timer (for snappull). Admin can provide means to invoke some methods like snappull,snapshoot .

          • CMD_GET_FILE: (command=filecontent&snapshhot=<snapshotname>&file=< filename>&offset=<fileOffset>&len=<length-ofchunk>&checksum=<true|false>) . This is invoked by a slave only to fetch a file or a part of it . This uses a custom format (described later)
          • CMD_LATEST_SNAP: (command=latestsnap). Returns the name of the latest snapshot (a namedlist response)
          • CMD_GET_SNAPSHOTS: (command=snaplist). Returns a list of all snapshot names (a namedlist response)
          • CMD_GET_FILE_LIST: (command=filelist&snap=<snapshotname>) . A list of all the files in the snapshot .conains name, lastmodified,size. (a namedlist response)
          • CMD_SNAP_SHOOT: (command=snapshoot). Do a force snapshoot.
          • CMD_DISABLE_SNAPPOLL: (command=disablesnappoll). For stopping the timer task
          • CMD_SNAP_PULL : (command=snappull). Does the following operations (done in slave). It is mostly triggered from a timertask based on the pollInterval value.
            • calls a CMD_LATEST_SNAP to the master and get the latest snapshot name
            • checks if it has the same (or if a snappull is going on)
            • if it is to be pulled, call CMD_GET_FILE_LIST to the master
            • for each file in the list make a call CMD_GET_FILE to the master. This command works in the following way
              • the server reads the file stream
              • It uses a CustomStreamResponseWriter (wt=filestream) to write the content. It has a packetSize (say 1mb)
              • It writes an int for length and another long for Adler32 checksum (if checksum=true). The packets are written one after another till EOF or an Exception.
              • SnapPuller.java In the client reads the packet length and checksum and tries to read the packet.If it is unable to read the given packet or the checksum does not match or there is an exception , it closes the connection and makes a new CMD_GET_FILE command with the offset = (totalbytesReceived). If everything is fine the packets are read till the _bytesDownloaded == fileSize_
              • This is continued till all the files are downloaded.
            • creates a folder index.tmp
            • for each file in the copied snapshot , try to create a hardlink in the index.tmp folder.(runs an OS specific command)
            • If hardlink creation fails use a copy
            • rename index.tmp to index
            • calls a commit on the updatehandler

          **note: The download tries to use the same stream to download the complete file .

          Please comment on the design

          Show
          Noble Paul added a comment - - edited We shall post a patch in the next few days The design is as follows: SnapShooter.java : registered as a listener on postCommit/postOptimize . It makes a copy of the latest index to a new snapshot folder (same as it is today). Only in master. It can optionally take in a 'snapDir' as configuration if the snapshot is to be created ina folder other than the data directory. ReplicationHandler: A requesthandler. This is registered in master & slave. It takes in the following config in the slave. Master node just needs an empty requesthandler registration. solrconfig.xml <requestHandler name= "replication" class= "solr.ReplicationHandler" > <str name= "masterUrl" > http:// <host> : <port> /solr/ <corename> /replication </str> <str name= "pollInterVal" > HH:MM:SS </str> </requestHandler> ReplicationHandler Implements the following methods. Every method is invoked over http GET . These methods are usually trigerred from the slave (over http) or timer (for snappull). Admin can provide means to invoke some methods like snappull,snapshoot . CMD_GET_FILE: (command=filecontent&snapshhot=<snapshotname>&file=< filename>&offset=<fileOffset>&len=<length-ofchunk>&checksum=<true|false>) . This is invoked by a slave only to fetch a file or a part of it . This uses a custom format (described later) CMD_LATEST_SNAP: (command=latestsnap) . Returns the name of the latest snapshot (a namedlist response) CMD_GET_SNAPSHOTS: (command=snaplist) . Returns a list of all snapshot names (a namedlist response) CMD_GET_FILE_LIST: (command=filelist&snap=<snapshotname>) . A list of all the files in the snapshot .conains name, lastmodified,size. (a namedlist response) CMD_SNAP_SHOOT: (command=snapshoot) . Do a force snapshoot. CMD_DISABLE_SNAPPOLL: (command=disablesnappoll) . For stopping the timer task CMD_SNAP_PULL : (command=snappull) . Does the following operations (done in slave). It is mostly triggered from a timertask based on the pollInterval value. calls a CMD_LATEST_SNAP to the master and get the latest snapshot name checks if it has the same (or if a snappull is going on) if it is to be pulled, call CMD_GET_FILE_LIST to the master for each file in the list make a call CMD_GET_FILE to the master. This command works in the following way the server reads the file stream It uses a CustomStreamResponseWriter (wt=filestream) to write the content. It has a packetSize (say 1mb) It writes an int for length and another long for Adler32 checksum (if checksum=true). The packets are written one after another till EOF or an Exception. SnapPuller.java In the client reads the packet length and checksum and tries to read the packet.If it is unable to read the given packet or the checksum does not match or there is an exception , it closes the connection and makes a new CMD_GET_FILE command with the offset = (totalbytesReceived). If everything is fine the packets are read till the _ bytesDownloaded == fileSize _ This is continued till all the files are downloaded. creates a folder index.tmp for each file in the copied snapshot , try to create a hardlink in the index.tmp folder.(runs an OS specific command) If hardlink creation fails use a copy rename index.tmp to index calls a commit on the updatehandler **note: The download tries to use the same stream to download the complete file . Please comment on the design
          Hide
          Yonik Seeley added a comment -

          How about posting a snapshot of what you have, with a few paragraphs explaining how things work, etc. Early feedback is better, and it allows more people to add their expertise. I'm sure many are interested in the ease-of-use gains this patch can bring.

          Show
          Yonik Seeley added a comment - How about posting a snapshot of what you have, with a few paragraphs explaining how things work, etc. Early feedback is better, and it allows more people to add their expertise. I'm sure many are interested in the ease-of-use gains this patch can bring.

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Noble Paul
            • Votes:
              5 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development