Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: search
    • Labels:
      None

      Description

      We'll abstract CommComponent in this issue.

      1. SOLR-1431.patch
        27 kB
        Jason Rutherglen
      2. SOLR-1431.patch
        31 kB
        Jason Rutherglen
      3. SOLR-1431.patch
        31 kB
        Jason Rutherglen
      4. SOLR-1431.patch
        32 kB
        Jason Rutherglen
      5. SOLR-1431.patch
        20 kB
        Noble Paul
      6. SOLR-1431.patch
        21 kB
        Noble Paul
      7. SOLR-1431.patch
        24 kB
        Noble Paul
      8. SOLR-1431.patch
        23 kB
        Jason Rutherglen
      9. SOLR-1431.patch
        27 kB
        Jason Rutherglen
      10. SOLR-1431.patch
        28 kB
        Noble Paul
      11. SOLR-1431.patch
        39 kB
        Noble Paul
      12. SOLR-1431.patch
        37 kB
        Noble Paul

        Issue Links

          Activity

          Hide
          Greg Bowyer added a comment -

          @Noble Paul

          I cooked something resembling a backport here SOLR-3079

          Show
          Greg Bowyer added a comment - @Noble Paul I cooked something resembling a backport here SOLR-3079
          Hide
          Noble Paul added a comment -

          Yes, we can, can u post a patch @jason ?

          Show
          Noble Paul added a comment - Yes, we can, can u post a patch @jason ?
          Hide
          Jason Rutherglen added a comment -

          Can we look at backporting this one to 3.x, given 4.x is a little ways off?

          Show
          Jason Rutherglen added a comment - Can we look at backporting this one to 3.x, given 4.x is a little ways off?
          Hide
          Noble Paul added a comment -

          I have committed it to trunk. We may need more iterations to clean it up

          Show
          Noble Paul added a comment - I have committed it to trunk. We may need more iterations to clean it up
          Hide
          Noble Paul added a comment -

          This time use a factory to create shardHandler

          
          <requestHandler name="standard" class="solr.SearchHandler" default="true">
              <!-- other params go here -->
           
               <shardHandlerFactory class="HttpShardHandlerFactory">
          
                  <int name="socketTimeOut">1000</int>
                  <int name="connTimeOut">5000</int>
                </shardHandler>
            </requestHandler>
          
          Show
          Noble Paul added a comment - This time use a factory to create shardHandler <requestHandler name= "standard" class= "solr.SearchHandler" default= "true" > <!-- other params go here --> <shardHandlerFactory class= "HttpShardHandlerFactory" > <int name= "socketTimeOut" > 1000 </int> <int name= "connTimeOut" > 5000 </int> </shardHandler> </requestHandler>
          Hide
          Jason Rutherglen added a comment -

          Noble, the Jira issue is HBASE-3529 where much of the code is offline on Git because of the different pieces involved. That being said, I've linked the various Lucene and Solr Jira issues that are required to implement Solr in HBase, eg LUCENE-2919 and SOLR-2563.

          Show
          Jason Rutherglen added a comment - Noble, the Jira issue is HBASE-3529 where much of the code is offline on Git because of the different pieces involved. That being said, I've linked the various Lucene and Solr Jira issues that are required to implement Solr in HBase, eg LUCENE-2919 and SOLR-2563 .
          Hide
          Mark Miller added a comment -

          Got a 3 day weekend, so I won't likely look at nobles patch more till next week - I def will still take a peek and weigh in, but this is simple enough that I don't mind if we just commit and iterate on trunk if necessary in further issues.

          Show
          Mark Miller added a comment - Got a 3 day weekend, so I won't likely look at nobles patch more till next week - I def will still take a peek and weigh in, but this is simple enough that I don't mind if we just commit and iterate on trunk if necessary in further issues.
          Hide
          Noble Paul added a comment -

          Jason. Open an issue and I will be glad to pitch in

          Show
          Noble Paul added a comment - Jason. Open an issue and I will be glad to pitch in
          Hide
          Jason Rutherglen added a comment -

          @Noble I agree, I don't think committing this patch should hold things up. That was just a little note.

          I've been looking at implementing Solr into HBase and am worried [somewhat] about the ZK libaries. HBase + Solr can help with massive scale near realtime systems you've described, eg, HBase implements splitting, partitioning, a fast write ahead log, etc. Facebook has implemented the index directly into HBase, which probably offers degraded indexing and search performance.

          We badly need the cloud features now

          Right, many users are going with Elastic Search for the reasons mentioned.

          Show
          Jason Rutherglen added a comment - @Noble I agree, I don't think committing this patch should hold things up. That was just a little note. I've been looking at implementing Solr into HBase and am worried [somewhat] about the ZK libaries. HBase + Solr can help with massive scale near realtime systems you've described, eg, HBase implements splitting, partitioning, a fast write ahead log, etc. Facebook has implemented the index directly into HBase, which probably offers degraded indexing and search performance. We badly need the cloud features now Right, many users are going with Elastic Search for the reasons mentioned.
          Hide
          Noble Paul added a comment -

          Jason, Yeah , it would be ideal. But we need to get things moving fast enough so that users can get the benefit ASAP. We badly need the cloud features now. I'm sure there are others too. We have clusters with 1000's of Solr hosts which are managed w/ ad-hoc tools.

          Show
          Noble Paul added a comment - Jason, Yeah , it would be ideal. But we need to get things moving fast enough so that users can get the benefit ASAP. We badly need the cloud features now. I'm sure there are others too. We have clusters with 1000's of Solr hosts which are managed w/ ad-hoc tools.
          Hide
          Mark Miller added a comment -

          I think it could conflict with other uses of Zookeeper when the library versions are different.

          Yeah - always a problem with dependencies like this. It's hard to say what direction we go right now though - some have argued even non zookeeper mode should be single install zookeeper mode instead. Has it's advantages and disadvantages I think. For me, I can really only take it an issue at a team, and while I hope to drive some more things around SolrCloud soon, it's obviously been a while. Others have some issues open, but more ideas are always good.

          I certainly agree that CoreContainer could be modularized better - would help for testing too. I have an issue to do this for the persistence code (baby steps ), but feel free to open further issues.

          I somewhat took the easy route in integrating zookeeper - there are certainly lots of improvements that could be made overall. And TODO's to finish - I think a couple guys have done a few from the wiki in various issues, and I know loggly has privately impl'd a couple from their talk at revolution (would be cool to see that come back, but I know they are busy guys). I love TODO's - minimal effort, but when you put one at a future pain point, your code doesn't look so stupid even when it's not perfect yet

          We should discuss in other issues though.

          Show
          Mark Miller added a comment - I think it could conflict with other uses of Zookeeper when the library versions are different. Yeah - always a problem with dependencies like this. It's hard to say what direction we go right now though - some have argued even non zookeeper mode should be single install zookeeper mode instead. Has it's advantages and disadvantages I think. For me, I can really only take it an issue at a team, and while I hope to drive some more things around SolrCloud soon, it's obviously been a while. Others have some issues open, but more ideas are always good. I certainly agree that CoreContainer could be modularized better - would help for testing too. I have an issue to do this for the persistence code (baby steps ), but feel free to open further issues. I somewhat took the easy route in integrating zookeeper - there are certainly lots of improvements that could be made overall. And TODO's to finish - I think a couple guys have done a few from the wiki in various issues, and I know loggly has privately impl'd a couple from their talk at revolution (would be cool to see that come back, but I know they are busy guys). I love TODO's - minimal effort, but when you put one at a future pain point, your code doesn't look so stupid even when it's not perfect yet We should discuss in other issues though.
          Hide
          Jason Rutherglen added a comment -

          Seems to be fine. It'd be great to modularize Zookeeper references into a separate abstract interface (like what's done here), and not tie it to CoreContainer. I think it could conflict with other uses of Zookeeper when the library versions are different.

          Show
          Jason Rutherglen added a comment - Seems to be fine. It'd be great to modularize Zookeeper references into a separate abstract interface (like what's done here), and not tie it to CoreContainer. I think it could conflict with other uses of Zookeeper when the library versions are different.
          Hide
          Mark Miller added a comment -

          I can look at this latest patch soon Noble. We should also give Jason a fair amount of time to weigh in.

          Show
          Mark Miller added a comment - I can look at this latest patch soon Noble. We should also give Jason a fair amount of time to weigh in.
          Hide
          Noble Paul added a comment -

          This might need some more cleanup, but I think it is close to a state where it can be checked in.

          Show
          Noble Paul added a comment - This might need some more cleanup, but I think it is close to a state where it can be checked in.
          Hide
          Noble Paul added a comment -

          Even the checkDistributed() method is abstracted out to ShardHandler. The current HttpShardHandler (this is default) takes care of zookeeper also

          Show
          Noble Paul added a comment - Even the checkDistributed() method is abstracted out to ShardHandler. The current HttpShardHandler (this is default) takes care of zookeeper also
          Hide
          Noble Paul added a comment -

          What are the concerns with the latest patch? I can work on them. I guess this is the optimal way to resolve SOLR-2592

          Show
          Noble Paul added a comment - What are the concerns with the latest patch? I can work on them. I guess this is the optimal way to resolve SOLR-2592
          Hide
          Noble Paul added a comment -

          Same as the previous patch w/ standard configuration

          Show
          Noble Paul added a comment - Same as the previous patch w/ standard configuration
          Hide
          Noble Paul added a comment -

          Jason,

          the configuration which I have specified lets you do ShardHandler specific configuration. It goes well with the general Solr configuration.

          <requestHandler name="standard" class="solr.SearchHandler" default="true">
              <!-- other params go here -->
           
               <shardHandler class="HttpShardHandler">
                  <!-- To be implemented-->
                  <int name="httpReadTimeOut">1000</int>
                  <int name="httpConnTimeOut">5000</int>
                </shardHandler>
          </requestHandler>
          

          Creating a new instance per request is not wise.

          Show
          Noble Paul added a comment - Jason, the configuration which I have specified lets you do ShardHandler specific configuration. It goes well with the general Solr configuration. <requestHandler name= "standard" class= "solr.SearchHandler" default= "true" > <!-- other params go here --> <shardHandler class= "HttpShardHandler" > <!-- To be implemented--> <int name= "httpReadTimeOut" > 1000 </int> <int name= "httpConnTimeOut" > 5000 </int> </shardHandler> </requestHandler> Creating a new instance per request is not wise.
          Hide
          Mark Miller added a comment -

          Does this patch incorporate any of Nobles feedback/patches? Any reason we want to create a new ShardHandler every request?

          Show
          Mark Miller added a comment - Does this patch incorporate any of Nobles feedback/patches? Any reason we want to create a new ShardHandler every request?
          Hide
          Mark Miller added a comment -

          Hang - might have gotten bit by JIRA's new patch sorting bs - used to just do it right and I prob had it sorting wrong or something. Just gave it one last go and the patch applied cleanly.

          Show
          Mark Miller added a comment - Hang - might have gotten bit by JIRA's new patch sorting bs - used to just do it right and I prob had it sorting wrong or something. Just gave it one last go and the patch applied cleanly.
          Hide
          Mark Miller added a comment -

          Can you update your patch to apply without the hunk failures? Tests will not pass for me locally with the current patch.

          Show
          Mark Miller added a comment - Can you update your patch to apply without the hunk failures? Tests will not pass for me locally with the current patch.
          Hide
          Jason Rutherglen added a comment -

          I just downloaded http://svn.apache.org/repos/asf/lucene/dev/trunk and applied the patch, and test-core passed. However the patch command mentioned specific hunks, though there was no .rej file.

          Show
          Jason Rutherglen added a comment - I just downloaded http://svn.apache.org/repos/asf/lucene/dev/trunk and applied the patch, and test-core passed. However the patch command mentioned specific hunks, though there was no .rej file.
          Hide
          Mark Miller added a comment -

          So my bad - looks like this patch is for 3.x - need to do it for 4 and port back.

          Show
          Mark Miller added a comment - So my bad - looks like this patch is for 3.x - need to do it for 4 and port back.
          Hide
          Mark Miller added a comment -

          I've got to look a little closer here - there was a conflict on trunk - naively just fixed it to compile and now I'm getting errors that are perhaps ip6 related? Need to investigate.

          java.lang.IllegalArgumentException: Invalid uri 'http://[::1]:33332/solr|localhost:53574/solr/select': escaped absolute path not valid

          Show
          Mark Miller added a comment - I've got to look a little closer here - there was a conflict on trunk - naively just fixed it to compile and now I'm getting errors that are perhaps ip6 related? Need to investigate. java.lang.IllegalArgumentException: Invalid uri 'http://[::1]:33332/solr|localhost:53574/solr/select': escaped absolute path not valid
          Hide
          Simon Willnauer added a comment -

          I think this patch looks good, mark I think we should commit this soon.

          simon

          Show
          Simon Willnauer added a comment - I think this patch looks good, mark I think we should commit this soon. simon
          Hide
          Jason Rutherglen added a comment -

          No worries mate!

          Show
          Jason Rutherglen added a comment - No worries mate!
          Hide
          Mark Miller added a comment -

          I'm guessing Noble doesn't have a lot of free time for Solr these days based on how much he has popped up recently.

          I'm headed to Germany for a while, but I'd be happy to look at this issue as soon as I get a chance. Might even be able to start later today if my final buzzwords slides start coming together.

          Show
          Mark Miller added a comment - I'm guessing Noble doesn't have a lot of free time for Solr these days based on how much he has popped up recently. I'm headed to Germany for a while, but I'd be happy to look at this issue as soon as I get a chance. Might even be able to start later today if my final buzzwords slides start coming together.
          Hide
          Jason Rutherglen added a comment -

          Methods moved up into abstract class ShardHandler. All tests pass.

          Show
          Jason Rutherglen added a comment - Methods moved up into abstract class ShardHandler. All tests pass.
          Hide
          Jason Rutherglen added a comment -

          Here's a patch updated to trunk.

          Show
          Jason Rutherglen added a comment - Here's a patch updated to trunk.
          Hide
          Jason Rutherglen added a comment -

          What's the status of this one?

          Show
          Jason Rutherglen added a comment - What's the status of this one?
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Noble Paul added a comment -

          Still needs to eliminate the ResponseBuildr#shards field. We have no means of knowing the no:of of shards available in the prepare phase.The ShardHandler decides just in time the no:of shards available at any given point in time.

          Show
          Noble Paul added a comment - Still needs to eliminate the ResponseBuildr#shards field. We have no means of knowing the no:of of shards available in the prepare phase.The ShardHandler decides just in time the no:of shards available at any given point in time.
          Hide
          Noble Paul added a comment -

          Search component is agnostic of the actual shards now. Some more work to be done to remove dependency on the ResponseBuilder#shards field by components.

          Show
          Noble Paul added a comment - Search component is agnostic of the actual shards now. Some more work to be done to remove dependency on the ResponseBuilder#shards field by components.
          Hide
          Noble Paul added a comment -

          sample of setting up a new ShardHandler

           <requestHandler name="standard" class="solr.SearchHandler" default="true">
              <!-- other params go here -->
           
               <shardHandler class="HttpShardHandler">
                  <!-- To be implemented-->
                  <int name="httpReadTimeOut">1000</int>
                  <int name="httpConnTimeOut">5000</int>
                </shardHandler>
            </requestHandler>
          
          Show
          Noble Paul added a comment - sample of setting up a new ShardHandler <requestHandler name= "standard" class= "solr.SearchHandler" default= "true" > <!-- other params go here --> <shardHandler class= "HttpShardHandler" > <!-- To be implemented--> <int name= "httpReadTimeOut" > 1000 </int> <int name= "httpConnTimeOut" > 5000 </int> </shardHandler> </requestHandler>
          Hide
          Noble Paul added a comment - - edited

          The MultiShardHandler interface should automatically handle the shards and the search handler should be agnostic of the shard names .

          The MultiShardHandler interface can be simplified to this

          public interface ShardHandler {
          
             public List<Callable<ShardResponse>> submit(ShardRequest sreq, ModifiableSolrParams params);  
          
          }
          
          Show
          Noble Paul added a comment - - edited The MultiShardHandler interface should automatically handle the shards and the search handler should be agnostic of the shard names . The MultiShardHandler interface can be simplified to this public interface ShardHandler { public List<Callable<ShardResponse>> submit(ShardRequest sreq, ModifiableSolrParams params); }
          Hide
          Jason Rutherglen added a comment -

          Same as the previous

          Show
          Jason Rutherglen added a comment - Same as the previous
          Hide
          Jason Rutherglen added a comment -

          Changed SearchHandler.getCommComponent to getMultiSearchHandler

          Show
          Jason Rutherglen added a comment - Changed SearchHandler.getCommComponent to getMultiSearchHandler
          Hide
          Noble Paul added a comment -

          should we have a separate plugin in solrconfig.xml like

          <shardHandler class="ZkHadoopRpcHandler" name="zk">
                  <str name="zkServer">http://foo</str>
          </shardHandler>
          

          and in the SearchHandler configuration it may have a reference to this

          <str name="shardHandler">zk</str>
          

          This will help the Handler to have its own configuration

          Show
          Noble Paul added a comment - should we have a separate plugin in solrconfig.xml like <shardHandler class= "ZkHadoopRpcHandler" name= "zk" > <str name= "zkServer" > http://foo </str> </shardHandler> and in the SearchHandler configuration it may have a reference to this <str name= "shardHandler" >zk</str> This will help the Handler to have its own configuration
          Hide
          Jason Rutherglen added a comment -
          • Changed the class names to the MultiShardHandler theme
          • Added Apache license headers
          Show
          Jason Rutherglen added a comment - Changed the class names to the MultiShardHandler theme Added Apache license headers
          Hide
          Jason Rutherglen added a comment -

          Well, it's not always used in distributed mode (see MultiEmbeddedSearchHandler where we're querying multiple local cores), so DistributedCommComponent wouldn't work either. Maybe MultiShardHandler?

          Show
          Jason Rutherglen added a comment - Well, it's not always used in distributed mode (see MultiEmbeddedSearchHandler where we're querying multiple local cores), so DistributedCommComponent wouldn't work either. Maybe MultiShardHandler?
          Hide
          Noble Paul added a comment -

          is it the best name ?. It somehow does not suggest that it is used for distributed search .

          Show
          Noble Paul added a comment - is it the best name ?. It somehow does not suggest that it is used for distributed search .
          Hide
          Jason Rutherglen added a comment -

          All tests pass (except the unrelated DirectUpdateHandlerTest)

          Show
          Jason Rutherglen added a comment - All tests pass (except the unrelated DirectUpdateHandlerTest)
          Hide
          Jason Rutherglen added a comment -

          This code was originally created when integrating Katta.

          Show
          Jason Rutherglen added a comment - This code was originally created when integrating Katta.

            People

            • Assignee:
              Noble Paul
              Reporter:
              Jason Rutherglen
            • Votes:
              3 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development