Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-7374

Backup/Restore should provide a param for specifying the directory implementation it should use

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.2
    • Component/s: None
    • Labels:
      None

      Description

      Currently when we create a backup we use SimpleFSDirectory to write the backup indexes. Similarly during a restore we open the index using FSDirectory.open .

      We should provide a param called directoryImpl or type which will be used to specify the Directory implementation to backup the index.
      Likewise during a restore you would need to specify the directory impl which was used during backup so that the index can be opened correctly.

      This param will address the problem that currently if a user is running Solr on HDFS there is no way to use the backup/restore functionality as the directory is hardcoded.

      With this one could be running Solr on a local FS but backup the index on HDFS etc.

      1. SOLR-7374.patch
        100 kB
        Varun Thacker
      2. SOLR-7374.patch
        90 kB
        Hrishikesh Gadre
      3. SOLR-7374.patch
        84 kB
        Hrishikesh Gadre
      4. SOLR-7374.patch
        80 kB
        Hrishikesh Gadre
      5. SOLR-7374.patch
        81 kB
        Mark Miller
      6. SOLR-7374.patch
        74 kB
        Hrishikesh Gadre
      7. SOLR-7374.patch
        74 kB
        Hrishikesh Gadre
      8. SOLR-7374.patch
        68 kB
        Hrishikesh Gadre
      9. SOLR-7374.patch
        68 kB
        Hrishikesh Gadre

        Issue Links

          Activity

          Hide
          janhoy Jan Høydahl added a comment -

          Looking forward to a collection backup command, we can imagine a cluster wide set of Directory configurations, such as an S3Directory configured with API keys etc. So should be letting each core/shard backup/restore command be able to write/read the backup directly to/from such cluster-wide locations. One way could be to support protocol and config in the location attribute, e.g. s3:/backups/collection1/shard1. Would make it super simple for Overseer to kick off a bunch of backup jobs across a cluster and let each shard write directly to correct target instead of intermediate local stoage. No idea of how to configure cluster-wide Directory configs though.

          Show
          janhoy Jan Høydahl added a comment - Looking forward to a collection backup command, we can imagine a cluster wide set of Directory configurations, such as an S3Directory configured with API keys etc. So should be letting each core/shard backup/restore command be able to write/read the backup directly to/from such cluster-wide locations. One way could be to support protocol and config in the location attribute, e.g. s3:/backups/collection1/shard1 . Would make it super simple for Overseer to kick off a bunch of backup jobs across a cluster and let each shard write directly to correct target instead of intermediate local stoage. No idea of how to configure cluster-wide Directory configs though.
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Varun Thacker are you still working on this? If not, I would like to take a crack at it...

          Show
          hgadre Hrishikesh Gadre added a comment - Varun Thacker are you still working on this? If not, I would like to take a crack at it...
          Hide
          varunthacker Varun Thacker added a comment -

          Hi Hrishikesh,

          Please feel free to work on it. I've not got time to even look back at this issue unfortunately.

          I'll be glad to review if you work on a patch though

          Show
          varunthacker Varun Thacker added a comment - Hi Hrishikesh, Please feel free to work on it. I've not got time to even look back at this issue unfortunately. I'll be glad to review if you work on a patch though
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Varun Thacker Sure. I think at the high-level we can expose API to configure Directory configurations for storing snapshots (e.g. local filesystem, HDFS, S3 etc.). As part of the implementation, we can store this config to ZK. Once this mechanism in place, we can pass appropriate directory configuration during the backup command execution. Does this make sense?

          CC David Smiley (who is also interested in SOLR-5750)

          Show
          hgadre Hrishikesh Gadre added a comment - Varun Thacker Sure. I think at the high-level we can expose API to configure Directory configurations for storing snapshots (e.g. local filesystem, HDFS, S3 etc.). As part of the implementation, we can store this config to ZK. Once this mechanism in place, we can pass appropriate directory configuration during the backup command execution. Does this make sense? CC David Smiley (who is also interested in SOLR-5750 )
          Hide
          dsmiley David Smiley added a comment -

          +1 to a specifying protocol/impl as a prefix. Presumably there would need to then be a way to register them and look them up.

          Show
          dsmiley David Smiley added a comment - +1 to a specifying protocol/impl as a prefix. Presumably there would need to then be a way to register them and look them up.
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Please find the patch attached. It includes following

          • A backup repository interface and concrete implementations for local file-system and HDFS
          • Ability to configure repositories via solr.xml
          • Refactored the "core" level backup/restore to use this repository interface
          • Unit test for HDFS integration.
          Show
          hgadre Hrishikesh Gadre added a comment - Please find the patch attached. It includes following A backup repository interface and concrete implementations for local file-system and HDFS Ability to configure repositories via solr.xml Refactored the "core" level backup/restore to use this repository interface Unit test for HDFS integration.
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Updated patch (includes couple of bug fixes).

          Show
          hgadre Hrishikesh Gadre added a comment - Updated patch (includes couple of bug fixes).
          Hide
          tomasflobbe Tomás Fernández Löbbe added a comment -

          I came to this issue following the discussion in SOLR-9055. I really like this patch. It should also allow future improvements (like making the backups incremental) and since it's really low level most of the logic is in the SnapShooter can be shared no matter what storage system is used.

          Small comment. There seems to be a bug in BackupRepositoryFactory

          BackupRepositoryFactory.java
          if (isDefault && (this.defaultBackupRepoPlugin != null)) {
              if (this.defaultBackupRepoPlugin != null) {
                  throw new SolrException(ErrorCode.SERVER_ERROR, "More than one backup repository is configured as default");
              }
              this.defaultBackupRepoPlugin = backupRepoPlugins[i];
          }
          

          Some improvements to TestSolrXml could catch this?
          I'd also add some more logging to this class, like logging.info("Adding BackupRepositoryFactory foo");. Configuration is pretty open to mistakes that are difficult to track.

          Show
          tomasflobbe Tomás Fernández Löbbe added a comment - I came to this issue following the discussion in SOLR-9055 . I really like this patch. It should also allow future improvements (like making the backups incremental) and since it's really low level most of the logic is in the SnapShooter can be shared no matter what storage system is used. Small comment. There seems to be a bug in BackupRepositoryFactory BackupRepositoryFactory.java if (isDefault && ( this .defaultBackupRepoPlugin != null )) { if ( this .defaultBackupRepoPlugin != null ) { throw new SolrException(ErrorCode.SERVER_ERROR, "More than one backup repository is configured as default " ); } this .defaultBackupRepoPlugin = backupRepoPlugins[i]; } Some improvements to TestSolrXml could catch this? I'd also add some more logging to this class, like logging.info("Adding BackupRepositoryFactory foo"); . Configuration is pretty open to mistakes that are difficult to track.
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Tomás Fernández Löbbe Thanks for the review. Let me submit an updated patch with your comments.

          Show
          hgadre Hrishikesh Gadre added a comment - Tomás Fernández Löbbe Thanks for the review. Let me submit an updated patch with your comments.
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Here is an updated patch which addresses the review comments from Tomás Fernández Löbbe

          Show
          hgadre Hrishikesh Gadre added a comment - Here is an updated patch which addresses the review comments from Tomás Fernández Löbbe
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Where are we at here? I'd really like to get this in so that SOLR-9055 can also be wrapped up. What do you think @varunthacker1989

          Show
          markrmiller@gmail.com Mark Miller added a comment - Where are we at here? I'd really like to get this in so that SOLR-9055 can also be wrapped up. What do you think @varunthacker1989
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Varun Thacker could you please review the patch?

          Show
          hgadre Hrishikesh Gadre added a comment - Varun Thacker could you please review the patch?
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I'm going to steal this issue

          Show
          markrmiller@gmail.com Mark Miller added a comment - I'm going to steal this issue
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I'm try to pinpoint if this has destabilized TestReplicationHandlerBackup or if it was before. Otherwise this is looking pretty good, just want to do a little manual testing. Anything else would be nitpicky mostly, but perhaps more comments coming.

          Show
          markrmiller@gmail.com Mark Miller added a comment - I'm try to pinpoint if this has destabilized TestReplicationHandlerBackup or if it was before. Otherwise this is looking pretty good, just want to do a little manual testing. Anything else would be nitpicky mostly, but perhaps more comments coming.
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Mark Miller Let me know if you need anything from my side. If you could post the test params (for failure), I can take a look.

          Show
          hgadre Hrishikesh Gadre added a comment - Mark Miller Let me know if you need anything from my side. If you could post the test params (for failure), I can take a look.
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Mark Miller Please find the updated patch which resolves the TestReplicationHandlerBackup failure.

          Show
          hgadre Hrishikesh Gadre added a comment - Mark Miller Please find the updated patch which resolves the TestReplicationHandlerBackup failure.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Thanks, I'm pretty much ready with this.

          One comment: Shouldn't we make BackupRepository an abstract class rather than interface? We can only add default methods in Java 8, but branch6x is still Java 7 right? Backcompat will be easier with abstract for now I think.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Thanks, I'm pretty much ready with this. One comment: Shouldn't we make BackupRepository an abstract class rather than interface? We can only add default methods in Java 8, but branch6x is still Java 7 right? Backcompat will be easier with abstract for now I think.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Just found the follow fail. Does not seem to be repeatable by seed, must be timing or something.

          [junit4] ERROR 0.72s J4 | TestReplicationHandlerBackup.doTestBackup <<<
          [junit4] > Throwable #1: java.util.NoSuchElementException
          [junit4] > at __randomizedtesting.SeedInfo.seed([2234E0A12F864773:63BFC0C40838B43C]:0)
          [junit4] > at sun.nio.fs.UnixDirectoryStream$UnixDirectoryIterator.next(UnixDirectoryStream.java:215)
          [junit4] > at sun.nio.fs.UnixDirectoryStream$UnixDirectoryIterator.next(UnixDirectoryStream.java:132)
          [junit4] > at org.apache.solr.handler.TestReplicationHandlerBackup.doTestBackup(TestReplicationHandlerBackup.java:174)
          [junit4] > at java.lang.Thread.run(Thread.java:745)
          [junit4] 2> 563888 INFO (SUITE-TestReplicationHandlerBackup-seed#[2234E0A12F864773]-worker) [ ] o.a.s.SolrTestCaseJ4 ###deleteCore

          NOTE: reproduce with: ant test -Dtestcase=TestReplicationHandlerBackup -Dtests.method=doTestBackup -Dtests.seed=2234E0A12F864773 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Asia/Calcutta -Dtests.asserts=true -Dtests.file.encoding=US-ASCII

          Show
          markrmiller@gmail.com Mark Miller added a comment - Just found the follow fail. Does not seem to be repeatable by seed, must be timing or something. [junit4] ERROR 0.72s J4 | TestReplicationHandlerBackup.doTestBackup <<< [junit4] > Throwable #1: java.util.NoSuchElementException [junit4] > at __randomizedtesting.SeedInfo.seed( [2234E0A12F864773:63BFC0C40838B43C] :0) [junit4] > at sun.nio.fs.UnixDirectoryStream$UnixDirectoryIterator.next(UnixDirectoryStream.java:215) [junit4] > at sun.nio.fs.UnixDirectoryStream$UnixDirectoryIterator.next(UnixDirectoryStream.java:132) [junit4] > at org.apache.solr.handler.TestReplicationHandlerBackup.doTestBackup(TestReplicationHandlerBackup.java:174) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> 563888 INFO (SUITE-TestReplicationHandlerBackup-seed# [2234E0A12F864773] -worker) [ ] o.a.s.SolrTestCaseJ4 ###deleteCore NOTE: reproduce with: ant test -Dtestcase=TestReplicationHandlerBackup -Dtests.method=doTestBackup -Dtests.seed=2234E0A12F864773 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Asia/Calcutta -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
          Hide
          dsmiley David Smiley added a comment -

          One comment: Shouldn't we make BackupRepository an abstract class rather than interface? We can only add default methods in Java 8, but branch6x is still Java 7 right?

          No; we're all Java 8 now – master & 6x. You must be thinking of 5x which was Java 7.

          Show
          dsmiley David Smiley added a comment - One comment: Shouldn't we make BackupRepository an abstract class rather than interface? We can only add default methods in Java 8, but branch6x is still Java 7 right? No; we're all Java 8 now – master & 6x. You must be thinking of 5x which was Java 7.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Looks like for the test fail we may just have to check the backup status and wait in a spot we are not.

          Show
          markrmiller@gmail.com Mark Miller added a comment - Looks like for the test fail we may just have to check the backup status and wait in a spot we are not.
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Looking into this...

          Show
          hgadre Hrishikesh Gadre added a comment - Looking into this...
          Hide
          varunthacker Varun Thacker added a comment -

          Hi Hrishikesh,

          Patch is looking good!

          Here are my main two concerns

          I think we need to deal with the "location" param better. Before this patch we used to read location as a query param . If the query param is empty then we read it from the cluster property.
          With this patch we are adding the ability to specify "location" in the solr.xml file but it will never be used? CollectionsHandler will bail out early today .

                  String location = req.getParams().get("location");
                  if (location == null) {
                    location = h.coreContainer.getZkController().getZkStateReader().getClusterProperty("location", (String) null);
                  }
                  if (location == null) {
                    throw new SolrException(ErrorCode.BAD_REQUEST, "'location' is not specified as a query parameter or set as a cluster property");
                  }
          

          One approach would be to deprecate the usage of cluster prop and look at query param followed by solr.xml ? Looking at three places seems messy .

          repo is the key used to specify the implementation. In Solr xml the tag is called repository . Should we just use repository throughout?

          Small changes:

          • Javadocs for BackupRepository - s/index/indexes
          • I think we should follow the if (condition) spacing convention ? Some of the places don't have spaceIn and some do.
          • In BackupRepositoryFactory : In these two log log lines can we mention the name as well - LOG.info("Default configuration for backup repository is with configuration params {}", defaultBackupRepoPlugin); and LOG.info("Added backup repository with configuration params {}", backupRepoPlugins[i]);
          • Can we reuse the "location" string with (BackupRepository.DEFAULT_LOCATION_PROPERTY) on line 871/925 of CoreAdminOperation? Let's fix it in CollectionsHandler and OverseerCollectionMessageHandler as well?
          • In RestoreCore do we need to deprecate the old RestoreCore ctor ? Any reason why we can't remove it directly?
          Show
          varunthacker Varun Thacker added a comment - Hi Hrishikesh, Patch is looking good! Here are my main two concerns I think we need to deal with the "location" param better. Before this patch we used to read location as a query param . If the query param is empty then we read it from the cluster property. With this patch we are adding the ability to specify "location" in the solr.xml file but it will never be used? CollectionsHandler will bail out early today . String location = req.getParams().get( "location" ); if (location == null ) { location = h.coreContainer.getZkController().getZkStateReader().getClusterProperty( "location" , ( String ) null ); } if (location == null ) { throw new SolrException(ErrorCode.BAD_REQUEST, "'location' is not specified as a query parameter or set as a cluster property" ); } One approach would be to deprecate the usage of cluster prop and look at query param followed by solr.xml ? Looking at three places seems messy . repo is the key used to specify the implementation. In Solr xml the tag is called repository . Should we just use repository throughout? Small changes: Javadocs for BackupRepository - s/index/indexes I think we should follow the if (condition) spacing convention ? Some of the places don't have spaceIn and some do. In BackupRepositoryFactory : In these two log log lines can we mention the name as well - LOG.info("Default configuration for backup repository is with configuration params {}", defaultBackupRepoPlugin); and LOG.info("Added backup repository with configuration params {}", backupRepoPlugins[i]); Can we reuse the "location" string with (BackupRepository.DEFAULT_LOCATION_PROPERTY) on line 871/925 of CoreAdminOperation? Let's fix it in CollectionsHandler and OverseerCollectionMessageHandler as well? In RestoreCore do we need to deprecate the old RestoreCore ctor ? Any reason why we can't remove it directly?
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I have the following changes already in my local copy:

          • fixed formatting
          • use BackupRepository.DEFAULT_LOCATION_PROPERTY

          Let me take a look at the location param in solr.xml issue.

          Show
          markrmiller@gmail.com Mark Miller added a comment - I have the following changes already in my local copy: fixed formatting use BackupRepository.DEFAULT_LOCATION_PROPERTY Let me take a look at the location param in solr.xml issue.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          "location" in the solr.xml

          The baseLocation in the test solr.xml files just looks like dev cruft to me.

          Show
          markrmiller@gmail.com Mark Miller added a comment - "location" in the solr.xml The baseLocation in the test solr.xml files just looks like dev cruft to me.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          New patch.

          Show
          markrmiller@gmail.com Mark Miller added a comment - New patch.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          deprecate the usage of cluster prop and look at query param followed by solr.xml

          It is a bit odd to have some config in solr.xml and then default location as a cluster prop, but much nicer to be able to easily change the default location on the fly. solr.xml is a pain to change and requires a restart.

          Show
          markrmiller@gmail.com Mark Miller added a comment - deprecate the usage of cluster prop and look at query param followed by solr.xml It is a bit odd to have some config in solr.xml and then default location as a cluster prop, but much nicer to be able to easily change the default location on the fly. solr.xml is a pain to change and requires a restart.
          Hide
          hgadre Hrishikesh Gadre added a comment - - edited

          Varun Thacker Mark Miller Thanks for the comments!

          I think we need to deal with the "location" param better. Before this patch we used to read location as a query param . If the query param is empty then we read it from the cluster property.

          With this patch we are adding the ability to specify "location" in the solr.xml file but it will never be used? CollectionsHandler will bail out early today .

          This is a partial patch handling ONLY core level changes. The collection level changes are being captured in the patch for SOLR-9055. I did this to keep the patch relatively short and easier to review. In the patch for SOLR-9055 - I have changed the CollectionsHandler implementation to read default location from solr.xml (instead of cluster property). Since this core level operation is "internal" - technically we don't have to handle the case for missing "location" param in this patch (i.e. we can keep the original behavior). I think I made this change to simplify unit testing.

          One approach would be to deprecate the usage of cluster prop and look at query param followed by solr.xml ? Looking at three places seems messy .

          [Mark] It is a bit odd to have some config in solr.xml and then default location as a cluster prop, but much nicer to be able to easily change the default location on the fly. solr.xml is a pain to change and requires a restart.

          I agree that cluster property approach is more convenient as compared to solr.xml. But since we allow users to configure multiple repositories in solr.xml, we can not really use the current cluster property as is. This is because user may want to specify different location for different file-systems (or repositories). Hence at minimum we need one cluster property per repository configuration (e.g. name could be <repository-name>-location). But based on my understanding CLUSTERPROP API implementation requires fixed (or well-known) property names,

          https://github.com/apache/lucene-solr/blob/651499c82df482b493b0ed166c2ab7196af0a794/solr/solrj/src/java/org/apache/solr/common/cloud/ClusterProperties.java#L90

          We may have to relax this restriction for this work. On the other hand, specifying default location in solr.xml is not so bad since user can always specify a location parameter to avoid restarting the Solr cluster. Thoughts?

          Can we reuse the "location" string with (BackupRepository.DEFAULT_LOCATION_PROPERTY) on line 871/925 of CoreAdminOperation? Let's fix it in CollectionsHandler and OverseerCollectionMessageHandler as well?

          Let me fix the CoreAdminOperation in this patch. I will defer the collection level changes to SOLR-9055.

          In RestoreCore do we need to deprecate the old RestoreCore ctor ? Any reason why we can't remove it directly?

          The deprecated constructor is used by the ReplicationHandler. The new constructor expects the BackupRepository reference which can be obtained only via CoreContainer. I couldn't find a way to get hold of CoreContainer in ReplicationHandler. Hence I didn't remove this constructor.

          repo is the key used to specify the implementation. In Solr xml the tag is called repository . Should we just use repository throughout?

          Sure that make sense. Let me fix this.

          Show
          hgadre Hrishikesh Gadre added a comment - - edited Varun Thacker Mark Miller Thanks for the comments! I think we need to deal with the "location" param better. Before this patch we used to read location as a query param . If the query param is empty then we read it from the cluster property. With this patch we are adding the ability to specify "location" in the solr.xml file but it will never be used? CollectionsHandler will bail out early today . This is a partial patch handling ONLY core level changes. The collection level changes are being captured in the patch for SOLR-9055 . I did this to keep the patch relatively short and easier to review. In the patch for SOLR-9055 - I have changed the CollectionsHandler implementation to read default location from solr.xml (instead of cluster property). Since this core level operation is "internal" - technically we don't have to handle the case for missing "location" param in this patch (i.e. we can keep the original behavior). I think I made this change to simplify unit testing. One approach would be to deprecate the usage of cluster prop and look at query param followed by solr.xml ? Looking at three places seems messy . [Mark] It is a bit odd to have some config in solr.xml and then default location as a cluster prop, but much nicer to be able to easily change the default location on the fly. solr.xml is a pain to change and requires a restart. I agree that cluster property approach is more convenient as compared to solr.xml. But since we allow users to configure multiple repositories in solr.xml, we can not really use the current cluster property as is. This is because user may want to specify different location for different file-systems (or repositories). Hence at minimum we need one cluster property per repository configuration (e.g. name could be <repository-name>-location). But based on my understanding CLUSTERPROP API implementation requires fixed (or well-known) property names, https://github.com/apache/lucene-solr/blob/651499c82df482b493b0ed166c2ab7196af0a794/solr/solrj/src/java/org/apache/solr/common/cloud/ClusterProperties.java#L90 We may have to relax this restriction for this work. On the other hand, specifying default location in solr.xml is not so bad since user can always specify a location parameter to avoid restarting the Solr cluster. Thoughts? Can we reuse the "location" string with (BackupRepository.DEFAULT_LOCATION_PROPERTY) on line 871/925 of CoreAdminOperation? Let's fix it in CollectionsHandler and OverseerCollectionMessageHandler as well? Let me fix the CoreAdminOperation in this patch. I will defer the collection level changes to SOLR-9055 . In RestoreCore do we need to deprecate the old RestoreCore ctor ? Any reason why we can't remove it directly? The deprecated constructor is used by the ReplicationHandler. The new constructor expects the BackupRepository reference which can be obtained only via CoreContainer. I couldn't find a way to get hold of CoreContainer in ReplicationHandler. Hence I didn't remove this constructor. repo is the key used to specify the implementation. In Solr xml the tag is called repository . Should we just use repository throughout? Sure that make sense. Let me fix this.
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Mark Miller Here is an updated patch.

          • Fixed trailing white-spaces
          • Removed unused and deprecated constructor from RestoreCore
          • Renamed constant referring to "location" property in BackupRepository interface.

          All the other review comments were already incorporated in your earlier patch.

          Show
          hgadre Hrishikesh Gadre added a comment - Mark Miller Here is an updated patch. Fixed trailing white-spaces Removed unused and deprecated constructor from RestoreCore Renamed constant referring to "location" property in BackupRepository interface. All the other review comments were already incorporated in your earlier patch.
          Hide
          varunthacker Varun Thacker added a comment -

          Hi Hrishikesh,

          Thanks for the updated patch! Looks good

          While reviewing it I didn't understand one thing -

          In ReplicationHandler#restore why are we always using LocalFileSystemRepository and not reading any repository param?
          Currently to backup/restore in standalone mode we use the replication handler ( https://cwiki.apache.org/confluence/display/solr/Making+and+Restoring+Backups+of+SolrCores ) . I think we should read the repository param and initialize accordingly.

          The backup/restore operation that is in CoreAdminOperation was added as part of SOLR-5750 for the Overseer to be able to call backup/restore on individual cores.

          I don't think this set of comments is entirely true. The first version of backup which had been there for ages allowed an empty location param and resorted to the cores data directory. It had nothing to do with shared file systems . The back compat part is true though.

              // Note - This logic is only applicable to the usecase where a shared file-system is exposed via
              // local file-system interface (primarily for backwards compatibility). For other use-cases, users
              // will be required to specify "location" where the backup should be stored.
          

          Lastly you mentioned this in a previous comment

          This is a partial patch handling ONLY core level changes. The collection level changes are being captured in the patch for SOLR-9055.

          However TestHdfsBackupRestore added in this patch is a solr cloud test . What other work is left for supporting collection level changes?
          I only briefly looked at SOLR-9055 and couldn't tell why we need ShardRequestProcessor etc.
          I would think the only work required would be in CollectionsHandler to deal with "location" , in OverseerCollectionMessageHandler the part where we read/write the meta information and ZK configs?

          Show
          varunthacker Varun Thacker added a comment - Hi Hrishikesh, Thanks for the updated patch! Looks good While reviewing it I didn't understand one thing - In ReplicationHandler#restore why are we always using LocalFileSystemRepository and not reading any repository param? Currently to backup/restore in standalone mode we use the replication handler ( https://cwiki.apache.org/confluence/display/solr/Making+and+Restoring+Backups+of+SolrCores ) . I think we should read the repository param and initialize accordingly. The backup/restore operation that is in CoreAdminOperation was added as part of SOLR-5750 for the Overseer to be able to call backup/restore on individual cores. I don't think this set of comments is entirely true. The first version of backup which had been there for ages allowed an empty location param and resorted to the cores data directory. It had nothing to do with shared file systems . The back compat part is true though. // Note - This logic is only applicable to the usecase where a shared file-system is exposed via // local file-system interface (primarily for backwards compatibility). For other use-cases, users // will be required to specify "location" where the backup should be stored. Lastly you mentioned this in a previous comment This is a partial patch handling ONLY core level changes. The collection level changes are being captured in the patch for SOLR-9055 . However TestHdfsBackupRestore added in this patch is a solr cloud test . What other work is left for supporting collection level changes? I only briefly looked at SOLR-9055 and couldn't tell why we need ShardRequestProcessor etc. I would think the only work required would be in CollectionsHandler to deal with "location" , in OverseerCollectionMessageHandler the part where we read/write the meta information and ZK configs?
          Hide
          varunthacker Varun Thacker added a comment -

          Also i found ZkStateReader#BACKUP_LOCATION constant. We should merge those two constants I guess

          Show
          varunthacker Varun Thacker added a comment - Also i found ZkStateReader#BACKUP_LOCATION constant. We should merge those two constants I guess
          Hide
          hgadre Hrishikesh Gadre added a comment - - edited

          Varun Thacker Thanks for the comments.

          In ReplicationHandler#restore why are we always using LocalFileSystemRepository and not reading any repository param?

          The reason is that ReplicationHandler is not configured with the CoreContainer (which is required to fetch the repository configuration). Also building two parallel implementations for the same functionality doesn't quite make sense. Can we instead make the Core level Backup/Restore APIs public (i.e. not limited to Solrcloud)? This allows users to keep using ReplicationHandler in case of local file-system but if they need to integrate with other file-systems they can move to these core level APIs. If feasible, we can even deprecate (and remove) backup/restore APIs from ReplicationHandler in future.

          However TestHdfsBackupRestore added in this patch is a solr cloud test . What other work is left for supporting collection level changes?

          I had to implement this test as a "cloud" test so as to enable testing these core level operations (since these operations are enabled only in the cloud mode). The collection-level changes include,

          • Backup/restore collection metadata
          • Check the version compatibility during restore
          • Strategy interface to define "how" backup operation is performed (e.g. copying the index files vs. a file-system snapshot etc.)

          I only briefly looked at SOLR-9055 and couldn't tell why we need ShardRequestProcessor etc.

          The main reason is to implement a index backup strategy. Also in general processing shard requests is such a common functionality that embedding it in the OverseerCollectionMessageHandler doesn't quite seem right (from modularity perspective).

          Also i found ZkStateReader#BACKUP_LOCATION constant. We should merge those two constants I guess

          Make sense. Let me do that.

          Show
          hgadre Hrishikesh Gadre added a comment - - edited Varun Thacker Thanks for the comments. In ReplicationHandler#restore why are we always using LocalFileSystemRepository and not reading any repository param? The reason is that ReplicationHandler is not configured with the CoreContainer (which is required to fetch the repository configuration). Also building two parallel implementations for the same functionality doesn't quite make sense. Can we instead make the Core level Backup/Restore APIs public (i.e. not limited to Solrcloud)? This allows users to keep using ReplicationHandler in case of local file-system but if they need to integrate with other file-systems they can move to these core level APIs. If feasible, we can even deprecate (and remove) backup/restore APIs from ReplicationHandler in future. However TestHdfsBackupRestore added in this patch is a solr cloud test . What other work is left for supporting collection level changes? I had to implement this test as a "cloud" test so as to enable testing these core level operations (since these operations are enabled only in the cloud mode). The collection-level changes include, Backup/restore collection metadata Check the version compatibility during restore Strategy interface to define "how" backup operation is performed (e.g. copying the index files vs. a file-system snapshot etc.) I only briefly looked at SOLR-9055 and couldn't tell why we need ShardRequestProcessor etc. The main reason is to implement a index backup strategy. Also in general processing shard requests is such a common functionality that embedding it in the OverseerCollectionMessageHandler doesn't quite seem right (from modularity perspective). Also i found ZkStateReader#BACKUP_LOCATION constant. We should merge those two constants I guess Make sense. Let me do that.
          Hide
          varunthacker Varun Thacker added a comment -

          Also building two parallel implementations for the same functionality doesn't quite make sense.

          Indeed . It's far from ideal right now. We document the core level backup/restore via the replication handler as thats where it was supported .

          With SOLR-5750 a hook to Core Admin to leverage it . It was simply for convenience and not meant to be made public. Maybe we should fix it leverage the ReplicationHandler instead . Or we could deprecate the usage via Replication Handler as it's more of a core admin operation anyways.

          But I think let's keep that to a separate Jira/discussion? For the scope of this Jira can we just support it in ReplicationHandler as well ?

          Show
          varunthacker Varun Thacker added a comment - Also building two parallel implementations for the same functionality doesn't quite make sense. Indeed . It's far from ideal right now. We document the core level backup/restore via the replication handler as thats where it was supported . With SOLR-5750 a hook to Core Admin to leverage it . It was simply for convenience and not meant to be made public. Maybe we should fix it leverage the ReplicationHandler instead . Or we could deprecate the usage via Replication Handler as it's more of a core admin operation anyways. But I think let's keep that to a separate Jira/discussion? For the scope of this Jira can we just support it in ReplicationHandler as well ?
          Hide
          hgadre Hrishikesh Gadre added a comment - - edited

          For the scope of this Jira can we just support it in ReplicationHandler as well ?

          Sure I am working on this. It looks like we may not be able to provide identical behavior w.r.t. core level backup/restore API.
          Specifically when user does not specify "location" parameter, the existing ReplicationHandler implementation uses a directory relative to the "data" directory. e.g.
          https://github.com/apache/lucene-solr/blob/a4455a4b14f2bf947db1136f9d5fc7d0d88d32ef/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java#L419
          https://github.com/apache/lucene-solr/blob/a4455a4b14f2bf947db1136f9d5fc7d0d88d32ef/solr/core/src/java/org/apache/solr/handler/SnapShooter.java#L67

          While this logic is OK on a local file-system, it would not work if user is using a different file-system for backup/restore. e.g. consider a case when a user configures HDFS repository without a default location (and using local file-system for storing index files).

          Note - when only a single repository is configured, we use it as a "default". Now consider a case when a user invokes backup/restore without specifying "location" and "repository" parameters, we don't want to use the "data" directory as the location since it may not be valid on HDFS. So I am adding a constraint that if "repository" parameter is specified then location must be specified either via "location" parameter OR via a repository configuration in solr.xml

          When "repository" parameter is not specified, we default to "LocalFileSystem" instead of configured default repository in solr.xml. This is to handle the use-case mentioned above. It also helps to maintain the backwards compatibility with the existing API behavior.

          On the other hand the Core level BACKUP API always fetches the "default" repository configuration from solr.xml and require that location be specified either via "location" parameter OR via a repository configuration. I hope this small difference in API behavior should be OK (since we should aim to retire one of the APIs).

          Show
          hgadre Hrishikesh Gadre added a comment - - edited For the scope of this Jira can we just support it in ReplicationHandler as well ? Sure I am working on this. It looks like we may not be able to provide identical behavior w.r.t. core level backup/restore API. Specifically when user does not specify "location" parameter, the existing ReplicationHandler implementation uses a directory relative to the "data" directory. e.g. https://github.com/apache/lucene-solr/blob/a4455a4b14f2bf947db1136f9d5fc7d0d88d32ef/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java#L419 https://github.com/apache/lucene-solr/blob/a4455a4b14f2bf947db1136f9d5fc7d0d88d32ef/solr/core/src/java/org/apache/solr/handler/SnapShooter.java#L67 While this logic is OK on a local file-system, it would not work if user is using a different file-system for backup/restore. e.g. consider a case when a user configures HDFS repository without a default location (and using local file-system for storing index files). Note - when only a single repository is configured, we use it as a "default". Now consider a case when a user invokes backup/restore without specifying "location" and "repository" parameters, we don't want to use the "data" directory as the location since it may not be valid on HDFS. So I am adding a constraint that if "repository" parameter is specified then location must be specified either via "location" parameter OR via a repository configuration in solr.xml When "repository" parameter is not specified, we default to "LocalFileSystem" instead of configured default repository in solr.xml. This is to handle the use-case mentioned above. It also helps to maintain the backwards compatibility with the existing API behavior. On the other hand the Core level BACKUP API always fetches the "default" repository configuration from solr.xml and require that location be specified either via "location" parameter OR via a repository configuration. I hope this small difference in API behavior should be OK (since we should aim to retire one of the APIs).
          Hide
          hgadre Hrishikesh Gadre added a comment -

          For the scope of this Jira can we just support it in ReplicationHandler as well ?

          It looks like we are also creating a snapshot as part of post commit/optimize operation. Not sure which repository should we use for this? Would this require adding another config param to ReplicationHandler?

          https://github.com/apache/lucene-solr/blob/a4455a4b14f2bf947db1136f9d5fc7d0d88d32ef/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java#L1324

          Show
          hgadre Hrishikesh Gadre added a comment - For the scope of this Jira can we just support it in ReplicationHandler as well ? It looks like we are also creating a snapshot as part of post commit/optimize operation. Not sure which repository should we use for this? Would this require adding another config param to ReplicationHandler? https://github.com/apache/lucene-solr/blob/a4455a4b14f2bf947db1136f9d5fc7d0d88d32ef/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java#L1324
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Varun Thacker OK I think I have addressed all the review comments. Could you please take a look?

          Show
          hgadre Hrishikesh Gadre added a comment - Varun Thacker OK I think I have addressed all the review comments. Could you please take a look?
          Hide
          varunthacker Varun Thacker added a comment -

          Changes look good!

          It looks like we are also creating a snapshot as part of post commit/optimize operation. Not sure which repository should we use for this? Would this require adding another config param to ReplicationHandler?

          Let's stick to the current model of reading from the solr.xml file and defaulting to local if its not defined ?

          Show
          varunthacker Varun Thacker added a comment - Changes look good! It looks like we are also creating a snapshot as part of post commit/optimize operation. Not sure which repository should we use for this? Would this require adding another config param to ReplicationHandler? Let's stick to the current model of reading from the solr.xml file and defaulting to local if its not defined ?
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          Varun Thacker, do you want to take this back for the commit now that you're back from vacation?

          Show
          markrmiller@gmail.com Mark Miller added a comment - Varun Thacker , do you want to take this back for the commit now that you're back from vacation?
          Hide
          hgadre Hrishikesh Gadre added a comment - - edited

          Varun Thacker

          Let's stick to the current model of reading from the solr.xml file and defaulting to local if its not defined ?

          I am not sure if you realize the problem here. This is the code snippet that I am wondering about,
          https://github.com/apache/lucene-solr/blob/a4455a4b14f2bf947db1136f9d5fc7d0d88d32ef/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java#L1324

          This particular code is invoked as a callback (and not directly as part of handling a user request).
          https://github.com/apache/lucene-solr/blob/a4455a4b14f2bf947db1136f9d5fc7d0d88d32ef/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java#L1182

          So in case there are multiple repositories configured, we need a way to configure which repository to use. In case of backup/restore API - this configuration parameter was passed by the user. I wonder if we need to add a config param in the ReplicationHandler ?

          Show
          hgadre Hrishikesh Gadre added a comment - - edited Varun Thacker Let's stick to the current model of reading from the solr.xml file and defaulting to local if its not defined ? I am not sure if you realize the problem here. This is the code snippet that I am wondering about, https://github.com/apache/lucene-solr/blob/a4455a4b14f2bf947db1136f9d5fc7d0d88d32ef/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java#L1324 This particular code is invoked as a callback (and not directly as part of handling a user request). https://github.com/apache/lucene-solr/blob/a4455a4b14f2bf947db1136f9d5fc7d0d88d32ef/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java#L1182 So in case there are multiple repositories configured, we need a way to configure which repository to use. In case of backup/restore API - this configuration parameter was passed by the user. I wonder if we need to add a config param in the ReplicationHandler ?
          Hide
          varunthacker Varun Thacker added a comment -

          Hi Hrishikesh,

          So in case there are multiple repositories configured, we need a way to configure which repository to use

          Alternatively how do you feel about forcing a default repository if multiple repos are configured? Currently if there is just 1 we make that default but when there are multiple repositories we don't necessarily need to provide a default.

          Mark - Sure I can assign it to myself. I'd like your opinion and Hrishikesh's opinion on this comment though -

          I had to implement this test as a "cloud" test so as to enable testing these core level operations (since these operations are enabled only in the cloud mode). The collection-level changes include,
          - Backup/restore collection metadata
          - Check the version compatibility during restore
          - Strategy interface to define "how" backup operation is performed (e.g. copying the index files vs. a file-system snapshot etc.)
          

          I feel the first point should be part of this Jira . We can then say this patch effectively solves makes the repository interface work for both standalone and cloud.

          Points 2/3 seem more like better guarantee checks and more options and aren't cloud related. We could tackle that in SOLR-9055

          Show
          varunthacker Varun Thacker added a comment - Hi Hrishikesh, So in case there are multiple repositories configured, we need a way to configure which repository to use Alternatively how do you feel about forcing a default repository if multiple repos are configured? Currently if there is just 1 we make that default but when there are multiple repositories we don't necessarily need to provide a default. Mark - Sure I can assign it to myself. I'd like your opinion and Hrishikesh's opinion on this comment though - I had to implement this test as a "cloud" test so as to enable testing these core level operations (since these operations are enabled only in the cloud mode). The collection-level changes include, - Backup/restore collection metadata - Check the version compatibility during restore - Strategy interface to define "how" backup operation is performed (e.g. copying the index files vs. a file-system snapshot etc.) I feel the first point should be part of this Jira . We can then say this patch effectively solves makes the repository interface work for both standalone and cloud. Points 2/3 seem more like better guarantee checks and more options and aren't cloud related. We could tackle that in SOLR-9055
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Varun Thacker

          Alternatively how do you feel about forcing a default repository if multiple repos are configured? Currently if there is just 1 we make that default but when there are multiple repositories we don't necessarily need to provide a default.

          Yes we can force the user to have a default repository if multiple repos are configured. But it seems to me that the code under consideration is added as part of SOLR-561 (i.e. Solr replication). Do we need to incorporate these changes in the replication work-flow? My gut feeling is that we shouldn't make any changes to this code as part of this JIRA.

          I feel the first point should be part of this Jira . We can then say this patch effectively solves makes the repository interface work for both standalone and cloud.

          Based on JIRA description, I thought the requirement for this JIRA is to support core level backup/restore. But I am OK with repurposing this JIRA to cover Solr cloud as well. Its just that the patch will grow in size I generally prefer to keep patch short which makes it easier to review and test. But if you are OK with larger patch, I can resubmit it. Let me know.

          Show
          hgadre Hrishikesh Gadre added a comment - Varun Thacker Alternatively how do you feel about forcing a default repository if multiple repos are configured? Currently if there is just 1 we make that default but when there are multiple repositories we don't necessarily need to provide a default. Yes we can force the user to have a default repository if multiple repos are configured. But it seems to me that the code under consideration is added as part of SOLR-561 (i.e. Solr replication). Do we need to incorporate these changes in the replication work-flow? My gut feeling is that we shouldn't make any changes to this code as part of this JIRA. I feel the first point should be part of this Jira . We can then say this patch effectively solves makes the repository interface work for both standalone and cloud. Based on JIRA description, I thought the requirement for this JIRA is to support core level backup/restore. But I am OK with repurposing this JIRA to cover Solr cloud as well. Its just that the patch will grow in size I generally prefer to keep patch short which makes it easier to review and test. But if you are OK with larger patch, I can resubmit it. Let me know.
          Hide
          varunthacker Varun Thacker added a comment -

          Do we need to incorporate these changes in the replication work-flow?

          I think having a mandatory default repository and using that for this workflow should be enough right?

          Show
          varunthacker Varun Thacker added a comment - Do we need to incorporate these changes in the replication work-flow? I think having a mandatory default repository and using that for this workflow should be enough right?
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I don't see a problem with breaking off the full cloud work into another JIRA. Let's just relate them and then we can wrap this up and have a fresh canavas to discuss what has not already been reviewed.

          Which repo the replication handler should use seems a bit tricky. I don't like any of the current options that much. Still thinking about that a bit.

          Show
          markrmiller@gmail.com Mark Miller added a comment - I don't see a problem with breaking off the full cloud work into another JIRA. Let's just relate them and then we can wrap this up and have a fresh canavas to discuss what has not already been reviewed. Which repo the replication handler should use seems a bit tricky. I don't like any of the current options that much. Still thinking about that a bit.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          So I think we do want it configurable in ReplicationHandler config, but if it's not config'd, perhaps we could default to the first repo defined? And local if no repos are defined? Forcing a default for this seems annoying, same as forcing config.

          Show
          markrmiller@gmail.com Mark Miller added a comment - So I think we do want it configurable in ReplicationHandler config, but if it's not config'd, perhaps we could default to the first repo defined? And local if no repos are defined? Forcing a default for this seems annoying, same as forcing config.
          Hide
          varunthacker Varun Thacker added a comment -

          FYI I created SOLR-9239 for discussing the two approaches that we have for core level backup/restore

          Show
          varunthacker Varun Thacker added a comment - FYI I created SOLR-9239 for discussing the two approaches that we have for core level backup/restore
          Hide
          varunthacker Varun Thacker added a comment -

          Regarding the cloud level changes required, I agree with you'll - lets just then do it in another Jira as long as it's not SOLR-9055 as those have other enhancements . I guess I got thrown off by this comment previously.

          The collection level changes are being captured in the patch for SOLR-9055

          Show
          varunthacker Varun Thacker added a comment - Regarding the cloud level changes required, I agree with you'll - lets just then do it in another Jira as long as it's not SOLR-9055 as those have other enhancements . I guess I got thrown off by this comment previously. The collection level changes are being captured in the patch for SOLR-9055
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Mark Miller

          So I think we do want it configurable in ReplicationHandler config, but if it's not config'd, perhaps we could default to the first repo defined? And local if no repos are defined? Forcing a default for this seems annoying, same as forcing config.

          Personally I think that the code under consideration is quite unrelated to backup/restore functionality. It was added as part of SOLR-561 to implement replication. So my suggestion is to not consider it as part of this JIRA. We can file a separate JIRA for tracking. The backup/restore API in ReplicationHandler already accept repository parameter in my latest patch.

          Show
          hgadre Hrishikesh Gadre added a comment - Mark Miller So I think we do want it configurable in ReplicationHandler config, but if it's not config'd, perhaps we could default to the first repo defined? And local if no repos are defined? Forcing a default for this seems annoying, same as forcing config. Personally I think that the code under consideration is quite unrelated to backup/restore functionality. It was added as part of SOLR-561 to implement replication. So my suggestion is to not consider it as part of this JIRA. We can file a separate JIRA for tracking. The backup/restore API in ReplicationHandler already accept repository parameter in my latest patch.
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          I have no strong preference on whether that needs to use a Repository here or not, but Varun Thacker has not seemed very amenable to ignoring it yet.

          Show
          markrmiller@gmail.com Mark Miller added a comment - I have no strong preference on whether that needs to use a Repository here or not, but Varun Thacker has not seemed very amenable to ignoring it yet.
          Hide
          varunthacker Varun Thacker added a comment -

          Yeah we can tackle that in another Jira . Whether we enforce a default repository or not , it could warrant a param in the replication handler and hence could be tackled separately.

          For the patch , One thing I'd like to address would be - In TestHdfsBackupRestore make runCoreAdminCommand use the Replication handler instead since thats the current documented way of running core backups/restore.

          Show
          varunthacker Varun Thacker added a comment - Yeah we can tackle that in another Jira . Whether we enforce a default repository or not , it could warrant a param in the replication handler and hence could be tackled separately. For the patch , One thing I'd like to address would be - In TestHdfsBackupRestore make runCoreAdminCommand use the Replication handler instead since thats the current documented way of running core backups/restore.
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Varun Thacker Please find the latest patch.

          For the patch , One thing I'd like to address would be - In TestHdfsBackupRestore make runCoreAdminCommand use the Replication handler instead since thats the current documented way of running core backups/restore.

          Instead of replacing the usage of core admin API with replication handler, I just added another test which uses replication handler. This way we can test both the APIs.

          Mark Miller I made a small change in HdfsDirectory class to define a constant for the buffer size. This way we can use the same value for both HdfsDirectory as well as HdfsBackupRepository.

          Show
          hgadre Hrishikesh Gadre added a comment - Varun Thacker Please find the latest patch. For the patch , One thing I'd like to address would be - In TestHdfsBackupRestore make runCoreAdminCommand use the Replication handler instead since thats the current documented way of running core backups/restore. Instead of replacing the usage of core admin API with replication handler, I just added another test which uses replication handler. This way we can test both the APIs. Mark Miller I made a small change in HdfsDirectory class to define a constant for the buffer size. This way we can use the same value for both HdfsDirectory as well as HdfsBackupRepository.
          Hide
          hgadre Hrishikesh Gadre added a comment - - edited

          Varun Thacker I have filed SOLR-9242 to track the work required to enable this feature at the collection level backup/restore.

          Show
          hgadre Hrishikesh Gadre added a comment - - edited Varun Thacker I have filed SOLR-9242 to track the work required to enable this feature at the collection level backup/restore.
          Hide
          varunthacker Varun Thacker added a comment -

          Hi Hrishikesh ,

          I took your latest patch and made a few minor changes. I'll list them out as accurately as I can ( I closed the window where I was noted down the changes while making them )

          1. In BackupRepository removed some modifiers that were not needed.
          2. Created BackupRestoreUtils for some of the tests to reuse the classes. Methods like indexDocs and {{verifyDocs]} were moved there and made to reuse in our all backup/restore tests
          3. Made some modifications to the CHANGES entry
          4. Merged the replication handler and core admin handler test in TestHdfsBackupRestoreCore to one. We test it randomly

          I've run the test suite once and it passes along with precommit. Let me know if this looks good . I plan to go over it one more time tomorrow morning and commit it otherwise

          Show
          varunthacker Varun Thacker added a comment - Hi Hrishikesh , I took your latest patch and made a few minor changes. I'll list them out as accurately as I can ( I closed the window where I was noted down the changes while making them ) 1. In BackupRepository removed some modifiers that were not needed. 2. Created BackupRestoreUtils for some of the tests to reuse the classes. Methods like indexDocs and {{verifyDocs]} were moved there and made to reuse in our all backup/restore tests 3. Made some modifications to the CHANGES entry 4. Merged the replication handler and core admin handler test in TestHdfsBackupRestoreCore to one. We test it randomly I've run the test suite once and it passes along with precommit. Let me know if this looks good . I plan to go over it one more time tomorrow morning and commit it otherwise
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Varun Thacker Thanks for the review. I think the changes look good. I have one question though.

          4. Merged the replication handler and core admin handler test in TestHdfsBackupRestoreCore to one. We test it randomly

          Wouldn't it be better to test both ways every time? I see that this pattern is used extensively in other tests too. My main concern is that the probability of one of the test not being executed in multiple iterations is > 0. If someone makes any subsequent changes to this code, this unit test doesn't provide a strong assurance that these changes are correct since one of the tests may not have been executed at all. So developers end up diagnosing and fixing such issues after committing the code (which could have been done before the first commit itself).

          I do think that random tests are very useful to test the Solr cloud behavior (e.g. chaos monkey tests). But in this case, I am not sure if we need randomness. Any thoughts?

          Show
          hgadre Hrishikesh Gadre added a comment - Varun Thacker Thanks for the review. I think the changes look good. I have one question though. 4. Merged the replication handler and core admin handler test in TestHdfsBackupRestoreCore to one. We test it randomly Wouldn't it be better to test both ways every time? I see that this pattern is used extensively in other tests too. My main concern is that the probability of one of the test not being executed in multiple iterations is > 0. If someone makes any subsequent changes to this code, this unit test doesn't provide a strong assurance that these changes are correct since one of the tests may not have been executed at all. So developers end up diagnosing and fixing such issues after committing the code (which could have been done before the first commit itself). I do think that random tests are very useful to test the Solr cloud behavior (e.g. chaos monkey tests). But in this case, I am not sure if we need randomness. Any thoughts?
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Varun Thacker Oh BTW let's not hold up the patch for this. This is more of a question than a review comment

          Show
          hgadre Hrishikesh Gadre added a comment - Varun Thacker Oh BTW let's not hold up the patch for this. This is more of a question than a review comment
          Hide
          markrmiller@gmail.com Mark Miller added a comment -

          So developers end up diagnosing and fixing such issues after committing the code

          This is a common issue, and usually it comes down to what the developers judge at commit. There is so much random testing going on, if we did it all in one run, the test run would take forever. At the same time, you want common paths to be fairly well tested every run.

          At the end of the day, I generally do it based on how painful it is in extra test time vs how core the functionality being tested is.

          If you are writing or changing tests, you generally run them many times before committing (at least you probably should in many cases - there is a beasting target and scripts to help with this), and you will find most things. The many Jenkins machines that are running tests all the time will find them otherwise, and that is okay too.

          We have a fairly lax commit then review policy and following up with a fix or two based on jenkins random fails is common.

          I'll leave this one for you and Varun to figure out, just giving some context.

          Show
          markrmiller@gmail.com Mark Miller added a comment - So developers end up diagnosing and fixing such issues after committing the code This is a common issue, and usually it comes down to what the developers judge at commit. There is so much random testing going on, if we did it all in one run, the test run would take forever. At the same time, you want common paths to be fairly well tested every run. At the end of the day, I generally do it based on how painful it is in extra test time vs how core the functionality being tested is. If you are writing or changing tests, you generally run them many times before committing (at least you probably should in many cases - there is a beasting target and scripts to help with this), and you will find most things. The many Jenkins machines that are running tests all the time will find them otherwise, and that is okay too. We have a fairly lax commit then review policy and following up with a fix or two based on jenkins random fails is common. I'll leave this one for you and Varun to figure out, just giving some context.
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Mark Miller Thanks for the insight. Varun Thacker I am ok with the current test configuration. Let me know if anything is needed from my side.

          Show
          hgadre Hrishikesh Gadre added a comment - Mark Miller Thanks for the insight. Varun Thacker I am ok with the current test configuration. Let me know if anything is needed from my side.
          Hide
          varunthacker Varun Thacker added a comment -

          I've got precommit to pass. Running the test suite one more time and then committing it.

          I pondered marking BackupRepository as experimental in case we need iron it out , but decided against it .

          Do you plan on tackling SOLR-9242 as well?

          Show
          varunthacker Varun Thacker added a comment - I've got precommit to pass. Running the test suite one more time and then committing it. I pondered marking BackupRepository as experimental in case we need iron it out , but decided against it . Do you plan on tackling SOLR-9242 as well?
          Hide
          hgadre Hrishikesh Gadre added a comment -

          Varun Thacker

          Do you plan on tackling SOLR-9242 as well?

          Yup. I already have a patch ready for submission. Just waiting for this patch to get committed to trunk.

          Show
          hgadre Hrishikesh Gadre added a comment - Varun Thacker Do you plan on tackling SOLR-9242 as well? Yup. I already have a patch ready for submission. Just waiting for this patch to get committed to trunk.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 07be2c42ba24fea7c4e84836aa4c3f8d059f71d6 in lucene-solr's branch refs/heads/master from Varun Thacker
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=07be2c4 ]

          SOLR-7374: Core level backup/restore now supports specifying a directory implementation

          Show
          jira-bot ASF subversion and git services added a comment - Commit 07be2c42ba24fea7c4e84836aa4c3f8d059f71d6 in lucene-solr's branch refs/heads/master from Varun Thacker [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=07be2c4 ] SOLR-7374 : Core level backup/restore now supports specifying a directory implementation
          Hide
          varunthacker Varun Thacker added a comment -

          I was looking at http://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/668/ , and it doesn't seem related to this jira from what I understood. It's been failing for quite a while also unfortunately.

          Show
          varunthacker Varun Thacker added a comment - I was looking at http://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/668/ , and it doesn't seem related to this jira from what I understood. It's been failing for quite a while also unfortunately.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 36183cad87dfc3fc8f0a1e0b0c210e8bd14a4ce0 in lucene-solr's branch refs/heads/master from Varun Thacker
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=36183ca ]

          SOLR-7374: Fixing test failures like build #3366. Index a minimum of 1 doc

          Show
          jira-bot ASF subversion and git services added a comment - Commit 36183cad87dfc3fc8f0a1e0b0c210e8bd14a4ce0 in lucene-solr's branch refs/heads/master from Varun Thacker [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=36183ca ] SOLR-7374 : Fixing test failures like build #3366. Index a minimum of 1 doc
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit eb071436331595cc453cd9c7d9c82d3269bb5e40 in lucene-solr's branch refs/heads/branch_6x from Varun Thacker
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eb07143 ]

          SOLR-7374: Fixing test failures like build #3366. Index a minimum of 1 doc

          Show
          jira-bot ASF subversion and git services added a comment - Commit eb071436331595cc453cd9c7d9c82d3269bb5e40 in lucene-solr's branch refs/heads/branch_6x from Varun Thacker [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eb07143 ] SOLR-7374 : Fixing test failures like build #3366. Index a minimum of 1 doc
          Hide
          varunthacker Varun Thacker added a comment -

          Thanks Hrishikesh and Mark!

          Show
          varunthacker Varun Thacker added a comment - Thanks Hrishikesh and Mark!
          Hide
          mikemccand Michael McCandless added a comment -

          Bulk close resolved issues after 6.2.0 release.

          Show
          mikemccand Michael McCandless added a comment - Bulk close resolved issues after 6.2.0 release.

            People

            • Assignee:
              varunthacker Varun Thacker
              Reporter:
              varunthacker Varun Thacker
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development