Solr
  1. Solr
  2. SOLR-1051

Support the merge of multiple indexes

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4
    • Component/s: update
    • Labels:
      None

      Description

      This is to support the merge of multiple indexes.

      1. SOLR-1051.patch
        4 kB
        Shalin Shekhar Mangar
      2. SOLR-1051.patch
        22 kB
        Shalin Shekhar Mangar
      3. SOLR-1051.patch
        20 kB
        Ning Li
      4. SOLR-1051.patch
        17 kB
        Ning Li

        Issue Links

          Activity

          Hide
          Ning Li added a comment -

          "AddIndexes" is exposed as a CoreAdmin command, as Shalin suggested.

          • As a CoreAdmin command, should "addIndexes" go through the update processor chain? If so, should it be exposed as an update command?
          • Should "commit" be called at the end of "addIndexes? Note that "commit" is not a CoreAdmin command.
          Show
          Ning Li added a comment - "AddIndexes" is exposed as a CoreAdmin command, as Shalin suggested. As a CoreAdmin command, should "addIndexes" go through the update processor chain? If so, should it be exposed as an update command? Should "commit" be called at the end of "addIndexes? Note that "commit" is not a CoreAdmin command.
          Hide
          Ning Li added a comment -

          Any comments? In the current patch, addIndexes goes to the update handler directly so not through the update processor chain, and commit is not called at the end.

          Show
          Ning Li added a comment - Any comments? In the current patch, addIndexes goes to the update handler directly so not through the update processor chain, and commit is not called at the end.
          Hide
          Yonik Seeley added a comment -

          The update processor has primarily been for documents not whole indexes.... still, if an update processor were keeping track of changes to the index, or what documents were in the index, then a merge event like this could be important.

          I think perhaps "merge" might be a better name than "add" though, since we may want to allow a core to have multiple lucene indexes in the future?

          Show
          Yonik Seeley added a comment - The update processor has primarily been for documents not whole indexes.... still, if an update processor were keeping track of changes to the index, or what documents were in the index, then a merge event like this could be important. I think perhaps "merge" might be a better name than "add" though, since we may want to allow a core to have multiple lucene indexes in the future?
          Hide
          Ning Li added a comment -

          Having "add/mergeIndexes" go through the update processor chain makes sense. But I'll keep it a CoreAdmin command since non-admin users probably shouldn't issue the command.

          I thought the difference between "merge" and "add" is that, e.g. given indexA and indexB, merge(indexA, indexB) vs. indexA.add(indexB). A new searcher on indexA will see indexB in case of "add", and won't in case of "merge". But if you think indexA.merge(indexB) is fine, I can rename it.

          Show
          Ning Li added a comment - Having "add/mergeIndexes" go through the update processor chain makes sense. But I'll keep it a CoreAdmin command since non-admin users probably shouldn't issue the command. I thought the difference between "merge" and "add" is that, e.g. given indexA and indexB, merge(indexA, indexB) vs. indexA.add(indexB). A new searcher on indexA will see indexB in case of "add", and won't in case of "merge". But if you think indexA.merge(indexB) is fine, I can rename it.
          Hide
          Ning Li added a comment -

          Finally get back to this. Here is the new patch.

          • Rename "addIndexes" to "mergeIndexes".
          • "MergeIndexes" now goes through the update process chain.
            Comments are welcome.
          Show
          Ning Li added a comment - Finally get back to this. Here is the new patch. Rename "addIndexes" to "mergeIndexes". "MergeIndexes" now goes through the update process chain. Comments are welcome.
          Hide
          Otis Gospodnetic added a comment -

          Shouldn't we have:
          indexC = merge(A, B), leaving A and B unmodifie

          Isn't that the most flexible approach?

          Show
          Otis Gospodnetic added a comment - Shouldn't we have: indexC = merge(A, B), leaving A and B unmodifie Isn't that the most flexible approach?
          Hide
          Yonik Seeley added a comment -

          indexC = merge(A, B), leaving A and B unmodified

          If you just want to add one index to another though, this would be more expensive as both indexes would need to be copied.

          Show
          Yonik Seeley added a comment - indexC = merge(A, B), leaving A and B unmodified If you just want to add one index to another though, this would be more expensive as both indexes would need to be copied.
          Hide
          Ning Li added a comment -

          indexC = merge(A, B), leaving A and B unmodified

          Thanks for the comments! With the current approach, you can achieve this by creating an empty index C then merging A and B into C, no?

          Show
          Ning Li added a comment - indexC = merge(A, B), leaving A and B unmodified Thanks for the comments! With the current approach, you can achieve this by creating an empty index C then merging A and B into C, no?
          Hide
          Shalin Shekhar Mangar added a comment -

          Patch updated to trunk. The CoreAdminHandler refactoring in SOLR-1106 had broken this.

          The javadocs of IW.addIndexesNoOptimize say that an IW should not be opened on the source indexes. I guess the use-case behind this feature takes care of this? If opening an IW on the source indexes can lead to corruption of the target index, is there any way to avoid it?

          I think this patch is ready for commit. We'd need to record the above warning on the wiki when we add details about this command. If there are no objections, I'll commit in a day or two.

          Show
          Shalin Shekhar Mangar added a comment - Patch updated to trunk. The CoreAdminHandler refactoring in SOLR-1106 had broken this. The javadocs of IW.addIndexesNoOptimize say that an IW should not be opened on the source indexes. I guess the use-case behind this feature takes care of this? If opening an IW on the source indexes can lead to corruption of the target index, is there any way to avoid it? I think this patch is ready for commit. We'd need to record the above warning on the wiki when we add details about this command. If there are no objections, I'll commit in a day or two.
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 779423.

          Thanks Ning!

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 779423. Thanks Ning!
          Hide
          Koji Sekiguchi added a comment -

          I got NPE when trying to MERGEINDEXES:

          http://localhost:8983/solr/admin/cores?action=MERGEINDEXES&core=core0&indexDirs=indexname

          java.lang.NullPointerException
          at org.apache.solr.update.processor.RunUpdateProcessor.<init>(RunUpdateProcessorFactory.java:55)
          at org.apache.solr.update.processor.RunUpdateProcessorFactory.getInstance(RunUpdateProcessorFactory.java:43)
          at org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:55)
          at org.apache.solr.handler.admin.CoreAdminHandler.handleMergeAction(CoreAdminHandler.java:191)
          at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:151)
          at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
          at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:301)
          at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
          at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
          at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
          at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
          at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
          at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
          at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
          at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
          at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
          at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
          at org.mortbay.jetty.Server.handle(Server.java:285)
          at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
          at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
          at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
          at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
          at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
          at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
          at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

          Solr was started by:

          $ cd example
          $ java -Dsolr.solr.home=./multicore -jar start.jar

          The cause of NPE is RunUpdateProcessor trys to get UpdateHandler via SolrCore, but core is null in req:

            public RunUpdateProcessor(SolrQueryRequest req, UpdateRequestProcessor next) {
              super( next );
              this.req = req;
              this.updateHandler = req.getCore().getUpdateHandler();
            }
          
          Show
          Koji Sekiguchi added a comment - I got NPE when trying to MERGEINDEXES: http://localhost:8983/solr/admin/cores?action=MERGEINDEXES&core=core0&indexDirs=indexname java.lang.NullPointerException at org.apache.solr.update.processor.RunUpdateProcessor.<init>(RunUpdateProcessorFactory.java:55) at org.apache.solr.update.processor.RunUpdateProcessorFactory.getInstance(RunUpdateProcessorFactory.java:43) at org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:55) at org.apache.solr.handler.admin.CoreAdminHandler.handleMergeAction(CoreAdminHandler.java:191) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:151) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:301) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Solr was started by: $ cd example $ java -Dsolr.solr.home=./multicore -jar start.jar The cause of NPE is RunUpdateProcessor trys to get UpdateHandler via SolrCore, but core is null in req: public RunUpdateProcessor(SolrQueryRequest req, UpdateRequestProcessor next) { super ( next ); this .req = req; this .updateHandler = req.getCore().getUpdateHandler(); }
          Hide
          Shalin Shekhar Mangar added a comment - - edited

          Since SOLR-1121, core admin commands do not get a core. So calling an UpdateProcessor through a core admin command cannot work.

          Show
          Shalin Shekhar Mangar added a comment - - edited Since SOLR-1121 , core admin commands do not get a core. So calling an UpdateProcessor through a core admin command cannot work.
          Hide
          Shalin Shekhar Mangar added a comment -

          The test case passed because it was using an EmbeddedSolrServer which creates SolrQueryRequest objects with the current core. The logic in SolrDispatchFilter has been changed to not create a special AdminCore.

          I'm wondering if it makes sense to have the UpdateProcessor hooks at all. Even in the previous scheme, the first core was designated as the admin core. Therefore, keeping track of merges through an update processor would require one to setup his update processor on the first core defined in solr.xml.

          How about we remove the UpdateProcessor hooks for merge command? Thoughts?

          Show
          Shalin Shekhar Mangar added a comment - The test case passed because it was using an EmbeddedSolrServer which creates SolrQueryRequest objects with the current core. The logic in SolrDispatchFilter has been changed to not create a special AdminCore. I'm wondering if it makes sense to have the UpdateProcessor hooks at all. Even in the previous scheme, the first core was designated as the admin core. Therefore, keeping track of merges through an update processor would require one to setup his update processor on the first core defined in solr.xml. How about we remove the UpdateProcessor hooks for merge command? Thoughts?
          Hide
          Ning Li added a comment -

          In the current approach, mergeIndexes is an admin command and the target core should be online. I haven't looked into the SolrDispatchFilter logic change, but it seems with this change, the following are the two valid options:

          • mergeIndexes is an update command and the target core should be online
          • mergeIndexes is an admin command and the target core should be offline

          The first option is close to what we have now. I like it a bit more because you keep track of the merge by going through UpdateProcessor. But you seem to prefer the second option?

          Show
          Ning Li added a comment - In the current approach, mergeIndexes is an admin command and the target core should be online. I haven't looked into the SolrDispatchFilter logic change, but it seems with this change, the following are the two valid options: mergeIndexes is an update command and the target core should be online mergeIndexes is an admin command and the target core should be offline The first option is close to what we have now. I like it a bit more because you keep track of the merge by going through UpdateProcessor. But you seem to prefer the second option?
          Hide
          Noble Paul added a comment - - edited

          A few suggestions:

          • There should be a provision to merge one core w/ another. According to me the most common usecase would be to create a core , add docs to that , and then just merge it into the main core which is serving requests. This way, the user will not need to touch the filesystem of directly.
          • The indexDirs parameter should not be comma separated values. http request can accept multiple values for same parameter
          Show
          Noble Paul added a comment - - edited A few suggestions: There should be a provision to merge one core w/ another. According to me the most common usecase would be to create a core , add docs to that , and then just merge it into the main core which is serving requests. This way, the user will not need to touch the filesystem of directly. The indexDirs parameter should not be comma separated values. http request can accept multiple values for same parameter
          Hide
          Shalin Shekhar Mangar added a comment -

          I like it a bit more because you keep track of the merge by going through UpdateProcessor.

          I see your point. I'll give a patch which passes the target core into the request.

          • There should be a provision to merge one core w/ another. According to me the most common usecase would be to create a core , add docs to that , and then just merge it into the main core which is serving requests. This way, the user will not need to touch the filesystem of directly.
          • The indexDirs parameter should not be comma separated values. http request can accept multiple values for same parameter

          Agree on both. I'll commit the fix and #2 first since the feature in trunk is broken. Then we can work on adding #1 which requires more changes.

          Show
          Shalin Shekhar Mangar added a comment - I like it a bit more because you keep track of the merge by going through UpdateProcessor. I see your point. I'll give a patch which passes the target core into the request. There should be a provision to merge one core w/ another. According to me the most common usecase would be to create a core , add docs to that , and then just merge it into the main core which is serving requests. This way, the user will not need to touch the filesystem of directly. The indexDirs parameter should not be comma separated values. http request can accept multiple values for same parameter Agree on both. I'll commit the fix and #2 first since the feature in trunk is broken. Then we can work on adding #1 which requires more changes.
          Hide
          Shalin Shekhar Mangar added a comment -
          1. Fix for NPE: Wrap the SolrQueryRequest proving the target core
          2. Change comma separated params to multiple params

          I'll commit this shortly.

          Show
          Shalin Shekhar Mangar added a comment - Fix for NPE: Wrap the SolrQueryRequest proving the target core Change comma separated params to multiple params I'll commit this shortly.
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 781688.

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 781688.
          Hide
          Grant Ingersoll added a comment -

          Can this be closed?

          Show
          Grant Ingersoll added a comment - Can this be closed?
          Hide
          Shalin Shekhar Mangar added a comment -

          Merging cores is the part which is left. I think it needs more thought/discussion before it can be implemented. I'll close this one and open another issue for 1.5 about merging cores.

          Show
          Shalin Shekhar Mangar added a comment - Merging cores is the part which is left. I think it needs more thought/discussion before it can be implemented. I'll close this one and open another issue for 1.5 about merging cores.
          Hide
          Shalin Shekhar Mangar added a comment -

          I've opened SOLR-1331 for the missing piece.

          Show
          Shalin Shekhar Mangar added a comment - I've opened SOLR-1331 for the missing piece.
          Hide
          Grant Ingersoll added a comment -

          Bulk close for Solr 1.4

          Show
          Grant Ingersoll added a comment - Bulk close for Solr 1.4

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Ning Li
            • Votes:
              6 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development