Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      The update handler needs an overhaul.

      A few goals I think we might want to look at:

      1. Cleanup - drop DirectUpdateHandler(2) line - move to something like UpdateHandler, DefaultUpdateHandler
      2. Expose the SolrIndexWriter in the api or add the proper abstractions to get done what we now do with special casing:
      if (directupdatehandler2)
      success
      else
      failish
      3. Stop closing the IndexWriter and start using commit (still lazy IW init though).
      4. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level.
      5. Keep NRT support in mind.
      6. Keep microsharding in mind (maintain logical index as multiple physical indexes)
      7. Address the current issues we face because multiple original/'reloaded' cores can have a different IndexWriter on the same index.

      1. SOLR-2193.patch
        50 kB
        Mark Miller
      2. SOLR-2193.patch
        79 kB
        Mark Miller
      3. SOLR-2193.patch
        90 kB
        Mark Miller
      4. SOLR-2193.patch
        103 kB
        Mark Miller
      5. SOLR-2193.patch
        104 kB
        Mark Miller
      6. SOLR-2193.patch
        105 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          Mark Miller added a comment -

          I've been playing with a patch the keeps the IndexWriter open always (shares them across core reloads) and drops our internal update locks - so far, all tests pass, but there are still issues to deal with.

          I'll post the patch once I work a few more things out. Won't cover everything - just a start to explore different ideas.

          Show
          Mark Miller added a comment - I've been playing with a patch the keeps the IndexWriter open always (shares them across core reloads) and drops our internal update locks - so far, all tests pass, but there are still issues to deal with. I'll post the patch once I work a few more things out. Won't cover everything - just a start to explore different ideas.
          Hide
          Mark Miller added a comment -

          I've updated my patch to trunk.

          You can find the most recent work here: https://github.com/markrmiller/Lucene-Solr-Lab/commit/051995d50160e7081854febd471e72b9e81b8a3f

          I'll upload a patch after I start lazy creating the IndexWriter.

          This can be a fairly large win on time to visibility - you don't wait for background merges to finish before opening a new IndexReader now.

          It's also a start at solving our rather nasty reload a core, briefly different writers on the same index problem (usually avoided because the overlap is brief and the IndexWriter created lazily).

          It's still early work. Issues remain.

          Show
          Mark Miller added a comment - I've updated my patch to trunk. You can find the most recent work here: https://github.com/markrmiller/Lucene-Solr-Lab/commit/051995d50160e7081854febd471e72b9e81b8a3f I'll upload a patch after I start lazy creating the IndexWriter. This can be a fairly large win on time to visibility - you don't wait for background merges to finish before opening a new IndexReader now. It's also a start at solving our rather nasty reload a core, briefly different writers on the same index problem (usually avoided because the overlap is brief and the IndexWriter created lazily). It's still early work. Issues remain.
          Hide
          Simon Willnauer added a comment -

          I've updated my patch to trunk.

          hmm maybe I miss something but I don't see a patch on this issue.

          You can find the most recent work here: https://github.com/markrmiller/Lucene-Solr-Lab/commit/051995d50160e7081854febd471e72b9e81b8a3f

          this seems to be good stuff why don't you create a branch for it?

          simon

          Show
          Simon Willnauer added a comment - I've updated my patch to trunk. hmm maybe I miss something but I don't see a patch on this issue. You can find the most recent work here: https://github.com/markrmiller/Lucene-Solr-Lab/commit/051995d50160e7081854febd471e72b9e81b8a3f this seems to be good stuff why don't you create a branch for it? simon
          Hide
          Mark Miller added a comment -

          I'll upload a patch after I start lazy creating the IndexWriter.

          Show
          Mark Miller added a comment - I'll upload a patch after I start lazy creating the IndexWriter.
          Hide
          Mark Miller added a comment -

          here is my first patch

          Show
          Mark Miller added a comment - here is my first patch
          Hide
          Mark Miller added a comment -

          I've started experimenting with some simple Lucene near-realtime support, which lets you avoid the commit on reopen as well. Even if everything in Solr is not ready to fit in with near-realtime, we might as well start pushing down the path.

          We can work out the names and strategy, but for the moment I have added a 'softCommit' boolean to the commit cmd - if you commit with this set to true, the SolrCore will repen the indexreader using the indexwriter from the updatehandler.

          Show
          Mark Miller added a comment - I've started experimenting with some simple Lucene near-realtime support, which lets you avoid the commit on reopen as well. Even if everything in Solr is not ready to fit in with near-realtime, we might as well start pushing down the path. We can work out the names and strategy, but for the moment I have added a 'softCommit' boolean to the commit cmd - if you commit with this set to true, the SolrCore will repen the indexreader using the indexwriter from the updatehandler.
          Hide
          Yonik Seeley added a comment -

          NRT finally... sweet!

          I wonder how this should work with autocommit?
          Someone may want a soft/reopen autocommit once every x seconds, but still may want a hard "flush to stable storage in case I crash" commit at some other duration.

          The other thing that might be cool is a client-specified freshness per request. For example, when they pass in a query, they specify that they need data that's no more than 1 second old... and if it's too old that will trigger a reopen (and block that specific request until the new searcher can be used). The benefit here is that big bulk uploads won't be interrupted if there is no time sensitive query traffic. The downside is that a high latency may be exposed to those requests if they depend on stuff that can take a lot of time the first time (like faceting).

          Show
          Yonik Seeley added a comment - NRT finally... sweet! I wonder how this should work with autocommit? Someone may want a soft/reopen autocommit once every x seconds, but still may want a hard "flush to stable storage in case I crash" commit at some other duration. The other thing that might be cool is a client-specified freshness per request. For example, when they pass in a query, they specify that they need data that's no more than 1 second old... and if it's too old that will trigger a reopen (and block that specific request until the new searcher can be used). The benefit here is that big bulk uploads won't be interrupted if there is no time sensitive query traffic. The downside is that a high latency may be exposed to those requests if they depend on stuff that can take a lot of time the first time (like faceting).
          Hide
          Michael McCandless added a comment -

          Fabulous!! Elimination of the IW.close() on Solr's commit, and using Lucene's NRT!!

          Show
          Michael McCandless added a comment - Fabulous!! Elimination of the IW.close() on Solr's commit, and using Lucene's NRT!!
          Hide
          Mark Miller added a comment -

          I wonder how this should work with autocommit?
          Someone may want a soft/reopen autocommit once every x seconds, but still may want a hard "flush to stable storage in case I crash" commit at some other duration.

          Right - I agree. How about another simple start? Simply add another commitTracker that does soft commits - then you can schedule a mix of soft and hard commits.

          The other thing that might be cool is a client-specified freshness per request. For example, when they pass in a query, they specify that they need data that's no more than 1 second old... and if it's too old that will trigger a reopen (and block that specific request until the new searcher can be used). The benefit here is that big bulk uploads won't be interrupted if there is no time sensitive query traffic. The downside is that a high latency may be exposed to those requests if they depend on stuff that can take a lot of time the first time (like faceting).

          Yeah - I remember you mentioning this before - I def think this would be cool - perhaps as a follow on issue - though hopefully the affect on bulk updates will be minimized when Lucene takes care of the 'flush blocks the world' issue.

          Show
          Mark Miller added a comment - I wonder how this should work with autocommit? Someone may want a soft/reopen autocommit once every x seconds, but still may want a hard "flush to stable storage in case I crash" commit at some other duration. Right - I agree. How about another simple start? Simply add another commitTracker that does soft commits - then you can schedule a mix of soft and hard commits. The other thing that might be cool is a client-specified freshness per request. For example, when they pass in a query, they specify that they need data that's no more than 1 second old... and if it's too old that will trigger a reopen (and block that specific request until the new searcher can be used). The benefit here is that big bulk uploads won't be interrupted if there is no time sensitive query traffic. The downside is that a high latency may be exposed to those requests if they depend on stuff that can take a lot of time the first time (like faceting). Yeah - I remember you mentioning this before - I def think this would be cool - perhaps as a follow on issue - though hopefully the affect on bulk updates will be minimized when Lucene takes care of the 'flush blocks the world' issue.
          Hide
          Mark Miller added a comment -

          new patch

          • some refactoring - pull committracker and indexwriterprovider out of directupdatehandler2 - move creatIndexWriter from updatehandler to indexwriterprovider
          • naively add another committracker for soft commits - solrconfig is the same except that instead of autocommit its autosoftcommit

          I prefer gits branching and merging for keeping patches up to date with trunk, so code also available here initially:
          https://github.com/markrmiller/Lucene-Solr-Lab/commits/SOLR-2193

          Show
          Mark Miller added a comment - new patch some refactoring - pull committracker and indexwriterprovider out of directupdatehandler2 - move creatIndexWriter from updatehandler to indexwriterprovider naively add another committracker for soft commits - solrconfig is the same except that instead of autocommit its autosoftcommit I prefer gits branching and merging for keeping patches up to date with trunk, so code also available here initially: https://github.com/markrmiller/Lucene-Solr-Lab/commits/SOLR-2193
          Hide
          Mark Miller added a comment -

          New Patch:

          • more polish, comments, refactoring
          • i've tried to take/keep IndexWriter type out of UpdateHandler API
          • reenables an autocommit test that was disabled long ago
          • couple of minor bug fixes to previous work
          • remove the unused waitFlush cmd
          Show
          Mark Miller added a comment - New Patch: more polish, comments, refactoring i've tried to take/keep IndexWriter type out of UpdateHandler API reenables an autocommit test that was disabled long ago couple of minor bug fixes to previous work remove the unused waitFlush cmd
          Hide
          Mark Miller added a comment -

          So one issue to consider:

          If we start sharing the IndexWriter across SolrCores like this, on SolrCore reload, if you have changed IndexWriter settings, they will not take affect.

          My thinking here is that that is not really a big deal.

          I'm thinking perhaps we might do the following:
          1. On reload, print a log warning that if IndexWriter settings have changed, they have not been picked up?
          2. Add another param to the update/commit cmd that lets you force the opening of a new IndexWriter with the latest settings.

          Show
          Mark Miller added a comment - So one issue to consider: If we start sharing the IndexWriter across SolrCores like this, on SolrCore reload, if you have changed IndexWriter settings, they will not take affect. My thinking here is that that is not really a big deal. I'm thinking perhaps we might do the following: 1. On reload, print a log warning that if IndexWriter settings have changed, they have not been picked up? 2. Add another param to the update/commit cmd that lets you force the opening of a new IndexWriter with the latest settings.
          Hide
          Mark Miller added a comment -

          If we really wanted, we could also apply the 'live' setting changes on the fly rather than opening a new IndexWriter if we wanted to detect that case.

          Show
          Mark Miller added a comment - If we really wanted, we could also apply the 'live' setting changes on the fly rather than opening a new IndexWriter if we wanted to detect that case.
          Hide
          Mark Miller added a comment -

          Also need to consider how we should work in IndexReaderWarmer. Have not thought about it in depth yet, but it could optionally reuse the warming queries? Or have a separate conf section?

          Show
          Mark Miller added a comment - Also need to consider how we should work in IndexReaderWarmer. Have not thought about it in depth yet, but it could optionally reuse the warming queries? Or have a separate conf section?
          Hide
          Yonik Seeley added a comment -

          it could optionally reuse the warming queries?

          Off the top of my head, I don't see a reason to differentiate between hard/soft commits wrt warming in general.

          Show
          Yonik Seeley added a comment - it could optionally reuse the warming queries? Off the top of my head, I don't see a reason to differentiate between hard/soft commits wrt warming in general.
          Hide
          Mark Miller added a comment -

          Hmm...yeah - doesn't seem like we could get away with only warming the one segment in Solr...

          Show
          Mark Miller added a comment - Hmm...yeah - doesn't seem like we could get away with only warming the one segment in Solr...
          Hide
          Jayson Minard added a comment -

          Some of this was already solved in:
          https://issues.apache.org/jira/browse/SOLR-1155

          (locking and re-opening index writer were fixed)

          Show
          Jayson Minard added a comment - Some of this was already solved in: https://issues.apache.org/jira/browse/SOLR-1155 (locking and re-opening index writer were fixed)
          Hide
          Jayson Minard added a comment -

          Since SOLR-1155 is probably an easier change for Solr 3.1 due to its ancestry, so to get the same benefits I'll work to update it for that version, assuming this patch of yours is for 4.x onwards.

          Show
          Jayson Minard added a comment - Since SOLR-1155 is probably an easier change for Solr 3.1 due to its ancestry, so to get the same benefits I'll work to update it for that version, assuming this patch of yours is for 4.x onwards.
          Hide
          Mark Miller added a comment -

          Yes - my plan for this was 4.x.

          Show
          Mark Miller added a comment - Yes - my plan for this was 4.x.
          Hide
          Mark Miller added a comment -

          I've got some fixes for this, and I've started on some tests and other minor steps forward. I'll put it up before too long.

          Show
          Mark Miller added a comment - I've got some fixes for this, and I've started on some tests and other minor steps forward. I'll put it up before too long.
          Hide
          Mark Miller added a comment -

          Here is a new patch - couple tests, couple fixes, etc, etc. Still has no commitWithin type support for soft commits.

          Tested and made auto soft commit code work.

          I spent some time today firing documents rapidly at Solr with a soft commit max time of 1 second. Fantastic results at about 100 wikipedia documents per second. Didn't change any other example settings this time.

          Show
          Mark Miller added a comment - Here is a new patch - couple tests, couple fixes, etc, etc. Still has no commitWithin type support for soft commits. Tested and made auto soft commit code work. I spent some time today firing documents rapidly at Solr with a soft commit max time of 1 second. Fantastic results at about 100 wikipedia documents per second. Didn't change any other example settings this time.
          Hide
          Mark Miller added a comment -

          Next I need to look at the thread safety of CommitTracker under the new locking system.

          Show
          Mark Miller added a comment - Next I need to look at the thread safety of CommitTracker under the new locking system.
          Hide
          Mark Miller added a comment -

          If we go with this separate softAutoCommit as an option, still need to think about overlapping hard/soft commits.

          Eg you might want to do a soft commit every 4 seconds and a hardcommit every 16 seconds, but on the 16th second you don't necessarily want to do both types of commit (though not likely that big a deal). I've got logic to avoid this in the commit by doc case, but nothing for the time based auto commit.

          CommitWithin support is also still an interesting additional option - as well as Yonik's adea about specifying a staleness hint at query time.

          Show
          Mark Miller added a comment - If we go with this separate softAutoCommit as an option, still need to think about overlapping hard/soft commits. Eg you might want to do a soft commit every 4 seconds and a hardcommit every 16 seconds, but on the 16th second you don't necessarily want to do both types of commit (though not likely that big a deal). I've got logic to avoid this in the commit by doc case, but nothing for the time based auto commit. CommitWithin support is also still an interesting additional option - as well as Yonik's adea about specifying a staleness hint at query time.
          Hide
          Grant Ingersoll added a comment -

          Crazy idea: drop the notion of commits all together (or make it an expert thing for the hard core). Default it to 1 second. I wonder how all of this plays with warming/caching, etc. Do you even need those things in this type of setup?

          Show
          Grant Ingersoll added a comment - Crazy idea: drop the notion of commits all together (or make it an expert thing for the hard core). Default it to 1 second. I wonder how all of this plays with warming/caching, etc. Do you even need those things in this type of setup?
          Hide
          Yonik Seeley added a comment -

          Crazy idea: drop the notion of commits all together (or make it an expert thing for the hard core). Default it to 1 second.

          That should just be a matter of configuration after this patch... set a default of commitWithin=1000 in the (an) update request handler.

          I think that commitWithin should mean soft commit. Users of commitWithin care about when the changes become visible, not when they are guaranteed to be fsync'd.

          Show
          Yonik Seeley added a comment - Crazy idea: drop the notion of commits all together (or make it an expert thing for the hard core). Default it to 1 second. That should just be a matter of configuration after this patch... set a default of commitWithin=1000 in the (an) update request handler. I think that commitWithin should mean soft commit. Users of commitWithin care about when the changes become visible, not when they are guaranteed to be fsync'd.
          Hide
          Hoss Man added a comment -

          I think that commitWithin should mean soft commit. Users of commitWithin care about when the changes become visible, not when they are guaranteed to be fsync'd.

          while i don't know that we can assume that's the expectation of all existing commitWithin users, i think it probably is safe to assume that the users who do expect commitWithin to refer to fsync can be expected to pay attention enough if we add a new "hardCommitWithin=35" option and start using that if it's what they want.

          Alternately: add commitWithin.style=(hard|soft) where the default is "soft" and let people specify it as a default on their update request handlers if the behavior they really want is "hard"

          Or simplify things even further: eliminate "softCommit" as an explicit type of action and add a new "commit.type=(hard|soft)" param exists – "commit.type" would affect both explicitly requested commits, and commitWithin.

          Show
          Hoss Man added a comment - I think that commitWithin should mean soft commit. Users of commitWithin care about when the changes become visible, not when they are guaranteed to be fsync'd. while i don't know that we can assume that's the expectation of all existing commitWithin users, i think it probably is safe to assume that the users who do expect commitWithin to refer to fsync can be expected to pay attention enough if we add a new "hardCommitWithin=35" option and start using that if it's what they want. Alternately: add commitWithin.style=(hard|soft) where the default is "soft" and let people specify it as a default on their update request handlers if the behavior they really want is "hard" Or simplify things even further: eliminate "softCommit" as an explicit type of action and add a new "commit.type=(hard|soft)" param exists – "commit.type" would affect both explicitly requested commits, and commitWithin.
          Hide
          Mark Miller added a comment -

          There are a few things I'd like to do first, but I think we should likely commit this sooner rather than later, then we can iterate on trunk. I don't think I'll get to the few things I want to before buzzwords (and then I'm taking a week and a half semi vaca after that), but I plan to push for commit after that if I've hit anything I consider a blocker.

          Show
          Mark Miller added a comment - There are a few things I'd like to do first, but I think we should likely commit this sooner rather than later, then we can iterate on trunk. I don't think I'll get to the few things I want to before buzzwords (and then I'm taking a week and a half semi vaca after that), but I plan to push for commit after that if I've hit anything I consider a blocker.
          Hide
          Mark Miller added a comment -

          Reporting the current state regarding original goals:

          1. Cleanup - drop DirectUpdateHandler(2) line - move to something like UpdateHandler, DefaultUpdateHandler

          I want to do this, but it's just easier to handle after the first commit.

          2. Expose the SolrIndexWriter in the api or add the proper abstractions to get done what we now do with special casing:
          if (directupdatehandler2)
          success
          else
          failish

          There is no need for this anymore.

          3. Stop closing the IndexWriter and start using commit (still lazy IW init though).

          This is done.

          4. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level.

          This is done.

          5. Keep NRT support in mind.

          Always This patch certainly won't complete the NRT work needed, but it's a large start, and a huge step forward for Solr. Much of what is left to do is in this area though.

          6. Keep microsharding in mind (maintain logical index as multiple physical indexes)

          Have not really addressed anything here - don't think I hurt anything either though. Worth considering the new IndexWriterProvider and how it might relate to this though.

          7. Address the current issues we face because multiple original/'reloaded' cores can have a different IndexWriter on the same index.

          I've made an attempt at this with the IndexWriterProvider and some simple ref counting that could use a once over. This changes how you should reload a Solrcore - rather than just opening a new one and closing the old one, you want to call a method on the SolrCore itself and get the new SolrCore.

          More to come. I've updated the patch to trunk as well - will attach shortly.

          Show
          Mark Miller added a comment - Reporting the current state regarding original goals: 1. Cleanup - drop DirectUpdateHandler(2) line - move to something like UpdateHandler, DefaultUpdateHandler I want to do this, but it's just easier to handle after the first commit. 2. Expose the SolrIndexWriter in the api or add the proper abstractions to get done what we now do with special casing: if (directupdatehandler2) success else failish There is no need for this anymore. 3. Stop closing the IndexWriter and start using commit (still lazy IW init though). This is done. 4. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level. This is done. 5. Keep NRT support in mind. Always This patch certainly won't complete the NRT work needed, but it's a large start, and a huge step forward for Solr. Much of what is left to do is in this area though. 6. Keep microsharding in mind (maintain logical index as multiple physical indexes) Have not really addressed anything here - don't think I hurt anything either though. Worth considering the new IndexWriterProvider and how it might relate to this though. 7. Address the current issues we face because multiple original/'reloaded' cores can have a different IndexWriter on the same index. I've made an attempt at this with the IndexWriterProvider and some simple ref counting that could use a once over. This changes how you should reload a Solrcore - rather than just opening a new one and closing the old one, you want to call a method on the SolrCore itself and get the new SolrCore. More to come. I've updated the patch to trunk as well - will attach shortly.
          Hide
          Jason Rutherglen added a comment -

          I think the Solr ref counting code should go/exit, it's prone to pile up.
          Instead as with Twitter's system, a new reader is opened per query,
          because the readers are lightweight enough. I think that's a better path
          to pursue than monkey wrenching Solr's existing system which from the
          ground up, is not designed for NRT. If this patch isn't implementing NRT,
          what is the point?

          Show
          Jason Rutherglen added a comment - I think the Solr ref counting code should go/exit, it's prone to pile up. Instead as with Twitter's system, a new reader is opened per query, because the readers are lightweight enough. I think that's a better path to pursue than monkey wrenching Solr's existing system which from the ground up, is not designed for NRT. If this patch isn't implementing NRT, what is the point?
          Hide
          Mark Miller added a comment -

          Hi Jason - long time no see

          I think the Solr ref counting code should go/exit, it's prone to pile up.

          Can you elaborate? I'm def open to other approaches. I'm not yet sure why you are concerned about pile up? Whenever a new SolrCore is opening, the previous ref will be closing. Have you tried the patch and seen a problem here?

          Instead as with Twitter's system, a new reader is opened per query,

          because the readers are lightweight enough.

          Hmmm...opening a new IndexReader is not usually lightweight at all right now...which is why we use reopen.

          If this patch isn't implementing NRT, what is the point?

          Can you elaborate on why you don't think it's implementing NRT? I've tested basic indexing/searching using wikipedia documents at about 50-100 documents a second, opening a new reader every second. That felt pretty near-real-time to me, but the phrase is subjective. There are remaining issues - Solr does not currently do everything at a per segment level as you know - but per segment use cases will be very fast, and non per segment use cases will be much more NRT too.

          You are questioning my whole patch Which I am happy about - but I'm wondering if you have actually reviewed it - if not, I wish you would!

          Show
          Mark Miller added a comment - Hi Jason - long time no see I think the Solr ref counting code should go/exit, it's prone to pile up. Can you elaborate? I'm def open to other approaches. I'm not yet sure why you are concerned about pile up? Whenever a new SolrCore is opening, the previous ref will be closing. Have you tried the patch and seen a problem here? Instead as with Twitter's system, a new reader is opened per query, because the readers are lightweight enough. Hmmm...opening a new IndexReader is not usually lightweight at all right now...which is why we use reopen. If this patch isn't implementing NRT, what is the point? Can you elaborate on why you don't think it's implementing NRT? I've tested basic indexing/searching using wikipedia documents at about 50-100 documents a second, opening a new reader every second. That felt pretty near-real-time to me, but the phrase is subjective. There are remaining issues - Solr does not currently do everything at a per segment level as you know - but per segment use cases will be very fast, and non per segment use cases will be much more NRT too. You are questioning my whole patch Which I am happy about - but I'm wondering if you have actually reviewed it - if not, I wish you would!
          Hide
          Mark Miller added a comment -

          Also - if it helps clarify:

          This issue simply starts taking advantage of Lucene's NRT work. Much of that is being improved as some of the cool ideas from the Twitter work are being made more generally applicable and integrated into trunk Lucene. This issue does not address NRT at that low level - for that see the Lucene JIRA issues on this topic.

          Show
          Mark Miller added a comment - Also - if it helps clarify: This issue simply starts taking advantage of Lucene's NRT work. Much of that is being improved as some of the cool ideas from the Twitter work are being made more generally applicable and integrated into trunk Lucene. This issue does not address NRT at that low level - for that see the Lucene JIRA issues on this topic.
          Hide
          Jason Rutherglen added a comment -

          This patch certainly won't complete the NRT work needed

          Mark, I was reading this comment.

          You are questioning my whole patch

          I think it'll be easier to add what's needed for this patch into Lucene rather than retrofit Solr. I mentioned this a while back however there was pushback on re-architecting Solr. Making everything per-segment would be much more productive than allowing NRT at this stage. Ah, I think you're simply trying to avoid the stop the world Solr has right now? If so that should be more prevalent in the Jira.

          IndexWriter writer = ((DirectUpdateHandler2)core.getUpdateHandler()).getIndexWriterProvider().getIndexWriter();

          Ugly Solr style code?!

          The commit in X time can be simple contrib class for Lucene. It doesn't need to be Solr specific.

          Anyways I tried to do this 2 years ago for NRT, there was pushback just get the IndexWriter like the above code from the update handler. <political>Wow</political>

          Show
          Jason Rutherglen added a comment - This patch certainly won't complete the NRT work needed Mark, I was reading this comment. You are questioning my whole patch I think it'll be easier to add what's needed for this patch into Lucene rather than retrofit Solr. I mentioned this a while back however there was pushback on re-architecting Solr. Making everything per-segment would be much more productive than allowing NRT at this stage. Ah, I think you're simply trying to avoid the stop the world Solr has right now? If so that should be more prevalent in the Jira. IndexWriter writer = ((DirectUpdateHandler2)core.getUpdateHandler()).getIndexWriterProvider().getIndexWriter(); Ugly Solr style code?! The commit in X time can be simple contrib class for Lucene. It doesn't need to be Solr specific. Anyways I tried to do this 2 years ago for NRT, there was pushback just get the IndexWriter like the above code from the update handler. <political>Wow</political>
          Hide
          Mark Miller added a comment -

          I think it'll be easier to add what's needed for this patch into Lucene rather than retrofit Solr.

          This makes no sense See my above comment.

          Ugly Solr style code?!

          This is simply to detect the case where the update handler was changed on reload - not something we even necessarily have to support - but a case I did consider. If you can suggest better code to handle such a case, please do.

          The commit in X time can be simple contrib class for Lucene. It doesn't need to be Solr specific.

          Moving the location of the auto commit code would be a very different issue. Not one that makes sense to me right now either.

          Anyways I tried to do this 2 years ago for NRT, there was pushback just get the IndexWriter like the above code from the update handler. <political>Wow</political>

          I remember that issue - I tried to make some comments to help you out with it - you never managed to get it very far if memory serves though - so now I'm taking my own crack at it, along with addressing a few other pet peeves I have.

          Show
          Mark Miller added a comment - I think it'll be easier to add what's needed for this patch into Lucene rather than retrofit Solr. This makes no sense See my above comment. Ugly Solr style code?! This is simply to detect the case where the update handler was changed on reload - not something we even necessarily have to support - but a case I did consider. If you can suggest better code to handle such a case, please do. The commit in X time can be simple contrib class for Lucene. It doesn't need to be Solr specific. Moving the location of the auto commit code would be a very different issue. Not one that makes sense to me right now either. Anyways I tried to do this 2 years ago for NRT, there was pushback just get the IndexWriter like the above code from the update handler. <political>Wow</political> I remember that issue - I tried to make some comments to help you out with it - you never managed to get it very far if memory serves though - so now I'm taking my own crack at it, along with addressing a few other pet peeves I have.
          Hide
          Mark Miller added a comment - - edited

          P.S. While it may not be believed, there are no politics behind this issue I did not intentionally side step any other issues - nor did Lucid even initiate that I work on this.

          I wanted to fix the update handler - I didn't want to let any history or pre conceived notions get in my way. Was I successful at that? I don't know. But I approached this as blank slate as I could - if I could change the update handler, what would I do? I didn't look at what anyone else had done. So far, this is it.

          The community can take my effort or reject it, just like any others out there. The motivation of this issue, as far as I am concerned, is my own self interest in doing some interesting coding, and my good will towards Lucene/Solr. Any other motivations where so minor they don't even register.

          Take it or leave it - I'm not looking to cram this down anyones throat.

          Show
          Mark Miller added a comment - - edited P.S. While it may not be believed, there are no politics behind this issue I did not intentionally side step any other issues - nor did Lucid even initiate that I work on this. I wanted to fix the update handler - I didn't want to let any history or pre conceived notions get in my way. Was I successful at that? I don't know. But I approached this as blank slate as I could - if I could change the update handler, what would I do? I didn't look at what anyone else had done. So far, this is it. The community can take my effort or reject it, just like any others out there. The motivation of this issue, as far as I am concerned, is my own self interest in doing some interesting coding, and my good will towards Lucene/Solr. Any other motivations where so minor they don't even register. Take it or leave it - I'm not looking to cram this down anyones throat.
          Hide
          Mark Miller added a comment -

          Here is a to trunk patch - I'll try and get to something that should be fairly committable before to long as I said - to a large degree that just means a thorough review myself, as well as working with any additional feedback.

          Show
          Mark Miller added a comment - Here is a to trunk patch - I'll try and get to something that should be fairly committable before to long as I said - to a large degree that just means a thorough review myself, as well as working with any additional feedback.
          Hide
          Jason Rutherglen added a comment -

          IndexWriter writer =((DirectUpdateHandler2)core.getUpdateHandler()).getIndexWriterProvider().getIndexWriter();

          Why isn't IW a part of SolrCore? It's the main class running the show. How
          can there be a Solr core without an IW? I think IW never gets closed until
          the SolrCore is closed. The next move would be to place all of the caches
          at the segment level.

          It's been clear for quite a while that you folks at "Lucid" are trying to
          protect your golden goose, eg, Solr from changing much unless dictated by
          your staff or a paying customer. I think in politics those are called
          bribes? Hence a large part of the recent fracas regarding modularizing the
          goose, whose 'resolution' has resulted in no changes.

          It's astonishing the changes that are OK for Solr by some people, that are
          no OK from others. This is not a meritocracy. If you insist on driving,
          you should incorporate some of the feedback given. Solr was hacked
          together from the beginning and this is yet another ugly retrofit that is
          being steamrolled in. If you're confident in your abilities you're
          confident enough to make major changes. I've never seen that on the Solr
          side of the Lucene project.

          I remember that issue - I tried to make some comments to help you out with it

          No there was push back on something silly and simple, eg, getting the IW
          from the UpdateHandler, just as you have done here. What is the point in
          contributing when they are blocked for no reason?

          SOLR-1155

          What happened to this poor guys patch? Nothing.

          Show
          Jason Rutherglen added a comment - IndexWriter writer =((DirectUpdateHandler2)core.getUpdateHandler()).getIndexWriterProvider().getIndexWriter(); Why isn't IW a part of SolrCore? It's the main class running the show. How can there be a Solr core without an IW? I think IW never gets closed until the SolrCore is closed. The next move would be to place all of the caches at the segment level. It's been clear for quite a while that you folks at "Lucid" are trying to protect your golden goose, eg, Solr from changing much unless dictated by your staff or a paying customer. I think in politics those are called bribes? Hence a large part of the recent fracas regarding modularizing the goose, whose 'resolution' has resulted in no changes. It's astonishing the changes that are OK for Solr by some people, that are no OK from others. This is not a meritocracy. If you insist on driving, you should incorporate some of the feedback given. Solr was hacked together from the beginning and this is yet another ugly retrofit that is being steamrolled in. If you're confident in your abilities you're confident enough to make major changes. I've never seen that on the Solr side of the Lucene project. I remember that issue - I tried to make some comments to help you out with it No there was push back on something silly and simple, eg, getting the IW from the UpdateHandler, just as you have done here. What is the point in contributing when they are blocked for no reason? SOLR-1155 What happened to this poor guys patch? Nothing.
          Hide
          Mark Miller added a comment -

          Heh. Golden Goose. There are geese on the farm I grew up next too. They never needed much protected - they ran in a pack and charged around fearless.

          If I had a Golden Goose (and I'm still looking if you know anyone), I'd choose to protect it with like a knife or a gun rather than code

          This is a meritocracy. Those with merit impart it on the deserving. There are a wide variety of those with merit from a diverse set of companies. So others that deserve merit can be sure to get a fair shake over time. Those without merit...well...they will be what they be.

          Show
          Mark Miller added a comment - Heh. Golden Goose. There are geese on the farm I grew up next too. They never needed much protected - they ran in a pack and charged around fearless. If I had a Golden Goose (and I'm still looking if you know anyone), I'd choose to protect it with like a knife or a gun rather than code This is a meritocracy. Those with merit impart it on the deserving. There are a wide variety of those with merit from a diverse set of companies. So others that deserve merit can be sure to get a fair shake over time. Those without merit...well...they will be what they be.
          Hide
          Jason Rutherglen added a comment -

          Mark,

          That's an odd non-technical answer, and in the meritocracy of comedy, not funny either.

          Show
          Jason Rutherglen added a comment - Mark, That's an odd non-technical answer, and in the meritocracy of comedy, not funny either.
          Hide
          Mark Miller added a comment -

          Plus this is a holiday Do I really have to take unfounded lucid crap even in my free time?

          Well, I'm game. I've pulled out my bottle of wine, and I'm ready to answer your questions:

          Why isn't IW a part of SolrCore?...

          I leave this as an exercise for the reader.

          I think in politics those are called bribes?

          I want it always to be known - that I will in fact take bribes. In fact, I'd like to take my first one any time anyone is willing to pay me. As this is a legally gray area, you will have to understand that I cannot place any guarantees on my work. Also, I'm currently only accepting bitcoin.

          If you insist on driving, you should incorporate some of the feedback given.

          I don't need to drive. I love it when others drives. Some people are such good drivers! Trust me - I was never one to jump to drive. I can be fairly absent minded. My friends often preferred to drive. But by golly, there where some people even I wouldn't drive with...sometimes cause I was never needing to go anywhere when they where driving, and other times cause they where scary drivers!

          If you're confident in your abilities you're confident enough to make major changes. I've never seen that on the Solr side of the Lucene project.

          I'm sorry! I will try harder! My abilities are not insane. But I think they are good. I've never been great with self rating, but I think I'm a solid above average. I'm even better at 4v4 Warcraft3, all archer strategy. If not for Korea, that is how I'd make my money. It would be way easier to accumulate bitcoin that way.

          Solr was hacked together from the beginning and this is yet another ugly retrofit that is being steamrolled in.

          Steamrolled? You hurt me...I only just above I don't won't to steam roll this issue that I am working in my free time of my own volition? My you are a fun guy

          No there was push back on something silly and simple, eg, getting the IW
          from the UpdateHandler, just as you have done here. What is the point in
          contributing when they are blocked for no reason?

          Jason, that is in test code My approach is plenty different from what you are up to

          What happened to this poor guys patch? Nothing.

          I had never seen this guys patch. It's not my job to jump on anyones patch if I did. Since I have started my rearchitect issue, I have seen a couple issues out there that do a small piece of what I am up to. I have looked at a couple, and they did not work how I wanted. You can try it a different way. Honestly, if you write some code that I like, I will commit it for you. But I will only commit what I'm comfortable with. Lucky for you there are a bunch of us. Many not from Lucid. All you gotto do is convince one of them to take on your issue or tackle your code.

          Personally, I make my contributions lately out of my own time and will. If it bothers you so much, I'll be happy to hold off - those that voted for this issue can wait for someone else to fix this at the "lucene level like twitter".

          Show
          Mark Miller added a comment - Plus this is a holiday Do I really have to take unfounded lucid crap even in my free time? Well, I'm game. I've pulled out my bottle of wine, and I'm ready to answer your questions: Why isn't IW a part of SolrCore?... I leave this as an exercise for the reader. I think in politics those are called bribes? I want it always to be known - that I will in fact take bribes. In fact, I'd like to take my first one any time anyone is willing to pay me. As this is a legally gray area, you will have to understand that I cannot place any guarantees on my work. Also, I'm currently only accepting bitcoin. If you insist on driving, you should incorporate some of the feedback given. I don't need to drive. I love it when others drives. Some people are such good drivers! Trust me - I was never one to jump to drive. I can be fairly absent minded. My friends often preferred to drive. But by golly, there where some people even I wouldn't drive with...sometimes cause I was never needing to go anywhere when they where driving, and other times cause they where scary drivers! If you're confident in your abilities you're confident enough to make major changes. I've never seen that on the Solr side of the Lucene project. I'm sorry! I will try harder! My abilities are not insane. But I think they are good. I've never been great with self rating, but I think I'm a solid above average. I'm even better at 4v4 Warcraft3, all archer strategy. If not for Korea, that is how I'd make my money. It would be way easier to accumulate bitcoin that way. Solr was hacked together from the beginning and this is yet another ugly retrofit that is being steamrolled in. Steamrolled? You hurt me...I only just above I don't won't to steam roll this issue that I am working in my free time of my own volition? My you are a fun guy No there was push back on something silly and simple, eg, getting the IW from the UpdateHandler, just as you have done here. What is the point in contributing when they are blocked for no reason? Jason, that is in test code My approach is plenty different from what you are up to What happened to this poor guys patch? Nothing. I had never seen this guys patch. It's not my job to jump on anyones patch if I did. Since I have started my rearchitect issue, I have seen a couple issues out there that do a small piece of what I am up to. I have looked at a couple, and they did not work how I wanted. You can try it a different way. Honestly, if you write some code that I like, I will commit it for you. But I will only commit what I'm comfortable with. Lucky for you there are a bunch of us. Many not from Lucid. All you gotto do is convince one of them to take on your issue or tackle your code. Personally, I make my contributions lately out of my own time and will. If it bothers you so much, I'll be happy to hold off - those that voted for this issue can wait for someone else to fix this at the "lucene level like twitter".
          Hide
          Jason Rutherglen added a comment -

          Mark I think you're missing the point. If you're "committer" then it's implied you review patches and interact with the community, nicely. That's not happening with in this issue, or in Solr as noted by in fact many people.

          Show
          Jason Rutherglen added a comment - Mark I think you're missing the point. If you're "committer" then it's implied you review patches and interact with the community, nicely. That's not happening with in this issue, or in Solr as noted by in fact many people.
          Hide
          Mark Miller added a comment -

          Hey man - I can only do what I can do. Let god sort it out. I'm ready to be judged. I didn't do it all perfect. I didn't make all the right choices. But overall I'd say I was way more good than bad. I may have been a little sociopathic during that one short stint in my teens, but other than that, its worked out well I think.

          Show
          Mark Miller added a comment - Hey man - I can only do what I can do. Let god sort it out. I'm ready to be judged. I didn't do it all perfect. I didn't make all the right choices. But overall I'd say I was way more good than bad. I may have been a little sociopathic during that one short stint in my teens, but other than that, its worked out well I think.
          Hide
          Jason Rutherglen added a comment -

          -1 on the patch, I just reviewed again. IndexWriter should be a part of SolrCore (IW is canonical), as we should not be opening and closing IWs in the life of a Solr core.

          Show
          Jason Rutherglen added a comment - -1 on the patch, I just reviewed again. IndexWriter should be a part of SolrCore (IW is canonical), as we should not be opening and closing IWs in the life of a Solr core.
          Hide
          Chris Male added a comment -

          Guys, this has gotten beyond ridiculous. Lets stick to the goal of helping Mark complete this work. If there's concerns about issues being ignored or whatever, lets address that elsewhere without the accusations and sarcasm.

          Why isn't IW a part of SolrCore?...

          I leave this as an exercise for the reader.

          I don't know if this is immediately obvious from the code (and consequently doesn't need explaining), but it seems on the surface to be a reasonable question to ask. Are you able to say why this isn't a preferable direction to take? The answer might prove useful to others too.

          Show
          Chris Male added a comment - Guys, this has gotten beyond ridiculous. Lets stick to the goal of helping Mark complete this work. If there's concerns about issues being ignored or whatever, lets address that elsewhere without the accusations and sarcasm. Why isn't IW a part of SolrCore?... I leave this as an exercise for the reader. I don't know if this is immediately obvious from the code (and consequently doesn't need explaining), but it seems on the surface to be a reasonable question to ask. Are you able to say why this isn't a preferable direction to take? The answer might prove useful to others too.
          Hide
          Mark Miller added a comment -

          -1 on the patch, I just reviewed again. IndexWriter should be a part of SolrCore (IW is canonical), as we should not be opening and closing IWs in the life of a Solr core.

          Okay, -1 accepted. You win, good fight.

          Show
          Mark Miller added a comment - -1 on the patch, I just reviewed again. IndexWriter should be a part of SolrCore (IW is canonical), as we should not be opening and closing IWs in the life of a Solr core. Okay, -1 accepted. You win, good fight.
          Hide
          Jason Rutherglen added a comment -

          Okay, -1 accepted. You win, good fight

          Mark this was no fight, this is the open source Apache way.

          Show
          Jason Rutherglen added a comment - Okay, -1 accepted. You win, good fight Mark this was no fight, this is the open source Apache way.
          Hide
          Mark Miller added a comment -

          I meant in the figurative sparring sense, not negative fight sense. Honestly Jason, you made my night I enjoyed our dialogue honestly.

          Show
          Mark Miller added a comment - I meant in the figurative sparring sense, not negative fight sense. Honestly Jason, you made my night I enjoyed our dialogue honestly.
          Hide
          Jason Rutherglen added a comment -

          I enjoyed our dialogue honestly

          I'd prefer to simply get things done rather than banter with no results.

          Show
          Jason Rutherglen added a comment - I enjoyed our dialogue honestly I'd prefer to simply get things done rather than banter with no results.
          Hide
          Mark Miller added a comment -

          Heh - you approached me the wrong way then.

          Show
          Mark Miller added a comment - Heh - you approached me the wrong way then.
          Hide
          Robert Muir added a comment -

          clear javadocs warnings from the patch.

          Show
          Robert Muir added a comment - clear javadocs warnings from the patch.
          Hide
          Simon Willnauer added a comment -

          Jason, your action on this issue is not appropriate and with my PMC Hat on I can't tolerate this! I want remind you to move back to an appropriate tone and help out rather than randomly -1. I don't see any political issues here neither is the -1 appropriate. You have done lots of realtime work and mark seems to be pretty open to comments too so get back to technical discussion. If you have any general issues feel free to raise them on dev@l.a.o

          simon

          Show
          Simon Willnauer added a comment - Jason, your action on this issue is not appropriate and with my PMC Hat on I can't tolerate this! I want remind you to move back to an appropriate tone and help out rather than randomly -1. I don't see any political issues here neither is the -1 appropriate. You have done lots of realtime work and mark seems to be pretty open to comments too so get back to technical discussion. If you have any general issues feel free to raise them on dev@l.a.o simon
          Hide
          Ryan McKinley added a comment -

          Reading over the comments here and without knowing the details, it looks like a big improvement over what we currently have.

          Jason – is your concern that this is a fundamentally wrong direction, or that it does not go far enough to support NRT? Can this patch be improved? Can it be committed, then improved?

          Show
          Ryan McKinley added a comment - Reading over the comments here and without knowing the details, it looks like a big improvement over what we currently have. Jason – is your concern that this is a fundamentally wrong direction, or that it does not go far enough to support NRT? Can this patch be improved? Can it be committed, then improved?
          Hide
          Jason Rutherglen added a comment -

          As previously suggested, we need a new issue that refactors IndexWriter into SolrCore, instead of placing it into an UpdateHandler. Then we can iterate on re/factoring the NRT functionality.

          Show
          Jason Rutherglen added a comment - As previously suggested, we need a new issue that refactors IndexWriter into SolrCore, instead of placing it into an UpdateHandler. Then we can iterate on re/factoring the NRT functionality.
          Hide
          Jason Rutherglen added a comment -

          this is a fundamentally wrong direction

          Yes. The idea of adding NRT is good though.

          Show
          Jason Rutherglen added a comment - this is a fundamentally wrong direction Yes. The idea of adding NRT is good though.
          Hide
          Michael McCandless added a comment -

          I haven't looked that closely a this patch yet, but it already fixes a long standing problem in Solr, that a long running merge blocks a Solr commit, because it switches to IW.commit instead of closing/opening the writer.

          This seems like a great step forward?

          Sure, we should pursue moving IW to SolrCore, cutting over to IW's NRT's APIs, making all caching per-segment, etc., but these should not block committing this improvement; they can be pursued as followon/parallel issues: progress not perfection.

          Show
          Michael McCandless added a comment - I haven't looked that closely a this patch yet, but it already fixes a long standing problem in Solr, that a long running merge blocks a Solr commit, because it switches to IW.commit instead of closing/opening the writer. This seems like a great step forward? Sure, we should pursue moving IW to SolrCore, cutting over to IW's NRT's APIs, making all caching per-segment, etc., but these should not block committing this improvement; they can be pursued as followon/parallel issues: progress not perfection.
          Hide
          Jason Rutherglen added a comment -

          I haven't looked that closely a this patch yet, but it already fixes a long standing problem in Solr, that a long running merge blocks a Solr commit, because it switches to IW.commit instead of closing/opening the writer.

          Yes, that is/was not clear in the issue. Thank you for spelling it out. However I think the patch is creating new abstract classes, that would then go away? Why not spend a little more time trying to do a more overall design for future refactoring?

          Show
          Jason Rutherglen added a comment - I haven't looked that closely a this patch yet, but it already fixes a long standing problem in Solr, that a long running merge blocks a Solr commit, because it switches to IW.commit instead of closing/opening the writer. Yes, that is/was not clear in the issue. Thank you for spelling it out. However I think the patch is creating new abstract classes, that would then go away? Why not spend a little more time trying to do a more overall design for future refactoring?
          Hide
          Robert Muir added a comment -

          However I think the patch is creating new abstract classes, that would then go away?

          Just one interface right, IndexWriterProvider?

          Mark seems to describe the reasoning behind this pretty well:

          7. Address the current issues we face because multiple original/'reloaded' cores can have a different IndexWriter on the same index.

          I've made an attempt at this with the IndexWriterProvider and some simple ref counting that could use a once over. This changes how you should reload a Solrcore - rather than just opening a new one and closing the old one, you want to call a method on the SolrCore itself and get the new SolrCore.

          and in the original goals

          solving our rather nasty reload a core, briefly different writers on the same index problem (usually avoided because the overlap is brief and the IndexWriter created lazily).

          I'm at best a Solr dummy, but it seems to me to be a good tradeoff. We could toss @lucene.internal or @lucene.experimental on this so its clear that things are still in flux?

          Show
          Robert Muir added a comment - However I think the patch is creating new abstract classes, that would then go away? Just one interface right, IndexWriterProvider? Mark seems to describe the reasoning behind this pretty well: 7. Address the current issues we face because multiple original/'reloaded' cores can have a different IndexWriter on the same index. I've made an attempt at this with the IndexWriterProvider and some simple ref counting that could use a once over. This changes how you should reload a Solrcore - rather than just opening a new one and closing the old one, you want to call a method on the SolrCore itself and get the new SolrCore. and in the original goals solving our rather nasty reload a core, briefly different writers on the same index problem (usually avoided because the overlap is brief and the IndexWriter created lazily). I'm at best a Solr dummy, but it seems to me to be a good tradeoff. We could toss @lucene.internal or @lucene.experimental on this so its clear that things are still in flux?
          Hide
          Jason Rutherglen added a comment -

          solving our rather nasty reload a core, briefly different writers on the same index problem (usually avoided because the overlap is brief and the IndexWriter created lazily).

          Robert I fully agree, however then the title of the Jira is incorrect.

          Also the whole ref counted thing in Solr:

          RefCounted<SolrIndexSearcher> holder = core.getNewestSearcher(false);
          SolrIndexSearcher s = holder.get();
          holder.decref();
          // since there could be two commits in a row, don't test for a specific new searcher
          // just test that the old one has been replaced.
          

          Should not be needed anymore. We're also adding ref counting on IWs now as well? All of this is unnecessary. If we're modularizing, this isn't right path to go one.

          Show
          Jason Rutherglen added a comment - solving our rather nasty reload a core, briefly different writers on the same index problem (usually avoided because the overlap is brief and the IndexWriter created lazily). Robert I fully agree, however then the title of the Jira is incorrect. Also the whole ref counted thing in Solr: RefCounted<SolrIndexSearcher> holder = core.getNewestSearcher( false ); SolrIndexSearcher s = holder.get(); holder.decref(); // since there could be two commits in a row, don't test for a specific new searcher // just test that the old one has been replaced. Should not be needed anymore. We're also adding ref counting on IWs now as well? All of this is unnecessary. If we're modularizing, this isn't right path to go one.
          Hide
          Ryan McKinley added a comment -

          we need a new issue that refactors IndexWriter into SolrCore, instead of placing it into an UpdateHandler

          Can't that be done as an iteration/improvement? Nothing here blocks that possible change – and this patch sorts out a bunch of stuff that needs to be taken care of anyway.

          Recent dev on /trunk has been fast and iterative (very exciting!) – this patch seems like a good way to get the ball rolling and move forward.

          Lets mark anything as @lucene.internal that has viable alternatives and start smaller patches to move that along. I apologize that this stuff is out of my league, but the smaller the issue the easier it is to understand and actually get committed.

          Show
          Ryan McKinley added a comment - we need a new issue that refactors IndexWriter into SolrCore, instead of placing it into an UpdateHandler Can't that be done as an iteration/improvement? Nothing here blocks that possible change – and this patch sorts out a bunch of stuff that needs to be taken care of anyway. Recent dev on /trunk has been fast and iterative (very exciting!) – this patch seems like a good way to get the ball rolling and move forward. Lets mark anything as @lucene.internal that has viable alternatives and start smaller patches to move that along. I apologize that this stuff is out of my league, but the smaller the issue the easier it is to understand and actually get committed.
          Hide
          Robert Muir added a comment -

          Robert I fully agree, however then the title of the Jira is incorrect.

          How is it incorrect? The description of the issue is to overhaul the UpdateHandler, not to rearchitect Solr for NRT, move IW into SolrCore, or other things.

          We're also adding ref counting on IWs now as well? All of this is unnecessary. If we're modularizing, this isn't right path to go one.

          How do you propose supporting core reloads without it? It seems to me (again solr dummy) that some mechanism like this is needed.

          I guess personally, just looking at the big picture, this issue seems like a great win for Solr.

          I tested the patch and all tests pass... if there are technical concerns with test coverage I'm happy to try to help there to help see this thing committed.

          Show
          Robert Muir added a comment - Robert I fully agree, however then the title of the Jira is incorrect. How is it incorrect? The description of the issue is to overhaul the UpdateHandler, not to rearchitect Solr for NRT, move IW into SolrCore, or other things. We're also adding ref counting on IWs now as well? All of this is unnecessary. If we're modularizing, this isn't right path to go one. How do you propose supporting core reloads without it? It seems to me (again solr dummy) that some mechanism like this is needed. I guess personally, just looking at the big picture, this issue seems like a great win for Solr. I tested the patch and all tests pass... if there are technical concerns with test coverage I'm happy to try to help there to help see this thing committed.
          Hide
          Jason Rutherglen added a comment -

          I guess personally, just looking at the big picture, this issue seems like a great win for Solr.

          Yes the concept of not stopping the world is great. The concept of Solr continuing to be difficult to customize is not so great.

          -1 on the implementation which introduces even more awkward layers into Solr, which should be going in the direction of removing the old cruft.

          Show
          Jason Rutherglen added a comment - I guess personally, just looking at the big picture, this issue seems like a great win for Solr. Yes the concept of not stopping the world is great. The concept of Solr continuing to be difficult to customize is not so great. -1 on the implementation which introduces even more awkward layers into Solr, which should be going in the direction of removing the old cruft.
          Hide
          Shai Erera added a comment -

          The concept of Solr continuing to be difficult to customize is not so great.

          I haven't looked at the patch at all, as I'm not familiar w/ Solr code and cannot comment on the technical issues here. But I do would like to understand if this patch indeed gets us further from being able to customize Solr in the future or not.

          By customizing I mean (and I realize not all of it is handled in this issue):

          1. Control the IW instance through not only properties, but IW extensions (for e.g. overriding addDocument / commit so that other custom data structures can be updated at the same time as well).
          2. Control the replication process, for e.g., replicating additional files, not only IW files, that are required by the runtime logic.
          3. Control the IndexSearcher instance, or the query execution engine, in order to run queries at the low-level. i.e., being able to call IS.search(Q, Collector), and not from some high-level API where I can only submit a String and get back Result.

          These are important extension points, IMO, to Solr that we can offer applications. I admit, really advanced applications that know what they do and want, that just don't want to rewrite Solr entirely in order to execute queries differently.

          NOTE: I am not trying, nor do I want, to discuss here whether it should be the roadmap of Solr (i.e., provide all sorts of extensions that enable the above), but merely asking if that patch indeed gets us further from accomplishing that. I will not cast any +/-1 vote if it does, just (being a Solr dummy, not even "at best") trying to understand if that's really what's being proposed here.

          Show
          Shai Erera added a comment - The concept of Solr continuing to be difficult to customize is not so great. I haven't looked at the patch at all, as I'm not familiar w/ Solr code and cannot comment on the technical issues here. But I do would like to understand if this patch indeed gets us further from being able to customize Solr in the future or not. By customizing I mean (and I realize not all of it is handled in this issue): Control the IW instance through not only properties, but IW extensions (for e.g. overriding addDocument / commit so that other custom data structures can be updated at the same time as well). Control the replication process, for e.g., replicating additional files, not only IW files, that are required by the runtime logic. Control the IndexSearcher instance, or the query execution engine, in order to run queries at the low-level. i.e., being able to call IS.search(Q, Collector), and not from some high-level API where I can only submit a String and get back Result. These are important extension points, IMO, to Solr that we can offer applications. I admit, really advanced applications that know what they do and want, that just don't want to rewrite Solr entirely in order to execute queries differently. NOTE: I am not trying, nor do I want, to discuss here whether it should be the roadmap of Solr (i.e., provide all sorts of extensions that enable the above), but merely asking if that patch indeed gets us further from accomplishing that. I will not cast any +/-1 vote if it does, just (being a Solr dummy, not even "at best") trying to understand if that's really what's being proposed here.
          Hide
          Robert Muir added a comment -

          -1 on the implementation which introduces even more awkward layers into Solr, which should be going in the direction of removing the old cruft.

          Sorry Jason, you cannot simply say -1 without answering my questions about how these problems can be solved in an alternative way.

          I hate to be an asshole, but your vote is non-binding. I'm going to take this issue to the finish line now.

          Show
          Robert Muir added a comment - -1 on the implementation which introduces even more awkward layers into Solr, which should be going in the direction of removing the old cruft. Sorry Jason, you cannot simply say -1 without answering my questions about how these problems can be solved in an alternative way. I hate to be an asshole, but your vote is non-binding. I'm going to take this issue to the finish line now.
          Hide
          Jason Rutherglen added a comment -

          I'm curious if someone who doesn't work at Lucid can be involved in Solr design discussions. In any case, please autocratically continue.

          Show
          Jason Rutherglen added a comment - I'm curious if someone who doesn't work at Lucid can be involved in Solr design discussions. In any case, please autocratically continue.
          Hide
          Robert Muir added a comment -

          I committed this patch (with some additional ref counting checks in SolrTestCaseJ4) to https://svn.apache.org/repos/asf/lucene/dev/branches/solr2193

          Jason (or anyone else!): if you want to be involved, please contribute constructively in the form of useful ideas and patches against this branch.

          In the meantime, I will be trying to improve the tests for what exists now, as its the best way (not manual reviews) to increase my confidence.

          Show
          Robert Muir added a comment - I committed this patch (with some additional ref counting checks in SolrTestCaseJ4) to https://svn.apache.org/repos/asf/lucene/dev/branches/solr2193 Jason (or anyone else!): if you want to be involved, please contribute constructively in the form of useful ideas and patches against this branch. In the meantime, I will be trying to improve the tests for what exists now, as its the best way (not manual reviews) to increase my confidence.
          Hide
          Jason Rutherglen added a comment -

          This article is an indicator of the types of benchmarks to perform: http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-elasticsearch/

          Show
          Jason Rutherglen added a comment - This article is an indicator of the types of benchmarks to perform: http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-elasticsearch/
          Hide
          Robert Muir added a comment -

          Jason, this issue isn't intended to solve NRT.
          You might want to create a followup issue for improved NRT support.

          Show
          Robert Muir added a comment - Jason, this issue isn't intended to solve NRT. You might want to create a followup issue for improved NRT support.
          Hide
          Jason Rutherglen added a comment -

          Jason, this issue isn't intended to solve NRT

          What is this line doing?

          newReader = currentReader.reopen(indexWriterProvider.getIndexWriter(), true);
          
          Show
          Jason Rutherglen added a comment - Jason, this issue isn't intended to solve NRT What is this line doing? newReader = currentReader.reopen(indexWriterProvider.getIndexWriter(), true );
          Hide
          Jason Rutherglen added a comment -

          Also:

          https://issues.apache.org/jira/browse/SOLR-2193?focusedCommentId=13016875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13016875

          And this comment:

          Can you elaborate on why you don't think it's implementing NRT? I've tested basic indexing/searching using wikipedia documents at about 50-100 documents a second, opening a new reader every second. That felt pretty near-real-time to me, but the phrase is subjective.

          from: https://issues.apache.org/jira/browse/SOLR-2193?focusedCommentId=13041268&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13041268

          Robert, your statement's confusing.

          Show
          Jason Rutherglen added a comment - Also: https://issues.apache.org/jira/browse/SOLR-2193?focusedCommentId=13016875&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13016875 And this comment: Can you elaborate on why you don't think it's implementing NRT? I've tested basic indexing/searching using wikipedia documents at about 50-100 documents a second, opening a new reader every second. That felt pretty near-real-time to me, but the phrase is subjective. from: https://issues.apache.org/jira/browse/SOLR-2193?focusedCommentId=13041268&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13041268 Robert, your statement's confusing.
          Hide
          Robert Muir added a comment -

          Its not confusing at all... its a step towards NRT, thats all.

          the main improvement to me, is to not close the indexwriter
          this patch doesn't need to fully make solr realtime.

          you can keep whining that it isn't, but this doesn't accomplish anything.

          Show
          Robert Muir added a comment - Its not confusing at all... its a step towards NRT, thats all. the main improvement to me, is to not close the indexwriter this patch doesn't need to fully make solr realtime. you can keep whining that it isn't, but this doesn't accomplish anything.
          Hide
          Simon Willnauer added a comment -

          I intend to close this issue and go on in SOLR-2565 / SOLR-2566 I hope these issues provide a more clear focus on what the issues trying to address. If nobody objects I am going to close this today.

          simon

          Show
          Simon Willnauer added a comment - I intend to close this issue and go on in SOLR-2565 / SOLR-2566 I hope these issues provide a more clear focus on what the issues trying to address. If nobody objects I am going to close this today. simon
          Hide
          Robert Muir added a comment -

          +1

          Show
          Robert Muir added a comment - +1
          Hide
          Ryan McKinley added a comment -

          +1

          this is a great step forward. better NRT support can come in later patches.

          Show
          Ryan McKinley added a comment - +1 this is a great step forward. better NRT support can come in later patches.
          Hide
          Simon Willnauer added a comment -

          further developments in SOLR-2565 and SOLR-2566

          Show
          Simon Willnauer added a comment - further developments in SOLR-2565 and SOLR-2566
          Hide
          Jason Rutherglen added a comment -

          Simon, thanks for opening new issues.

          Show
          Jason Rutherglen added a comment - Simon, thanks for opening new issues.
          Hide
          David Smiley added a comment -

          Curious; why is the resolution status "invalid"? If work gets committed to a feature-branch instead of trunk, does that mean the issue gets resolved this way? Perhaps it shouldn't be closed until it lands on trunk.

          Show
          David Smiley added a comment - Curious; why is the resolution status "invalid"? If work gets committed to a feature-branch instead of trunk, does that mean the issue gets resolved this way? Perhaps it shouldn't be closed until it lands on trunk.
          Hide
          Simon Willnauer added a comment -

          Curious; why is the resolution status "invalid"?

          well we decided to cut this into two new issues and close this one. see:

          further developments in SOLR-2565 and SOLR-2566

          there have been discussions about the focus here so we made it more clear.

          Show
          Simon Willnauer added a comment - Curious; why is the resolution status "invalid"? well we decided to cut this into two new issues and close this one. see: further developments in SOLR-2565 and SOLR-2566 there have been discussions about the focus here so we made it more clear.
          Hide
          Mark Miller added a comment -

          This issue is superceded by: SOLR-2565 Prevent IW#close and cut over to IW#commit

          Show
          Mark Miller added a comment - This issue is superceded by: SOLR-2565 Prevent IW#close and cut over to IW#commit
          Hide
          Mark Miller added a comment -

          Curious; why is the resolution status "invalid"?

          Dunno - it's not invalid. I've re-resolved as "duplicate"

          Show
          Mark Miller added a comment - Curious; why is the resolution status "invalid"? Dunno - it's not invalid. I've re-resolved as "duplicate"

            People

            • Assignee:
              Mark Miller
              Reporter:
              Mark Miller
            • Votes:
              7 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development