Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      See http://wiki.apache.org/solr/SolrCloud

      This is a real hassle - I didn't merge up to trunk before all the svn scrambling, so integrating cloud is now a bit difficult. I'm running through and just preparing a commit by hand though (applying changes/handling conflicts a file at a time).

      1. SOLR-1873.patch
        267 kB
        Mark Miller
      2. SOLR-1873.patch
        267 kB
        Mark Miller
      3. TEST-org.apache.solr.cloud.ZkSolrClientTest.txt
        85 kB
        Yonik Seeley
      4. SOLR-1873.patch
        263 kB
        Mark Miller
      5. SOLR-1873.patch
        265 kB
        Mark Miller
      6. SOLR-1873.patch
        266 kB
        Mark Miller
      7. SOLR-1873.patch
        207 kB
        Mark Miller
      8. zookeeper-3.3.1.jar
        988 kB
        Mark Miller
      9. SOLR-1873.patch
        259 kB
        Mark Miller
      10. SOLR-1873.patch
        258 kB
        Mark Miller
      11. SOLR-1873.patch
        261 kB
        Mark Miller
      12. SOLR-1873.patch
        248 kB
        Mark Miller
      13. ASF.LICENSE.NOT.GRANTED--SOLR-1873.patch
        251 kB
        Mark Miller
      14. ASF.LICENSE.NOT.GRANTED--SOLR-1873.patch
        249 kB
        Mark Miller
      15. ASF.LICENSE.NOT.GRANTED--SOLR-1873.patch
        248 kB
        Mark Miller
      16. ASF.LICENSE.NOT.GRANTED--log4j-over-slf4j-1.5.5.jar
        9 kB
        Mark Miller
      17. ASF.LICENSE.NOT.GRANTED--zookeeper-3.2.2.jar
        894 kB
        Mark Miller
      18. ASF.LICENSE.NOT.GRANTED--SOLR-1873.patch
        244 kB
        Mark Miller

        Activity

        Hide
        Mark Miller added a comment -

        Still a lot to do, but after a couple days work, getting pretty close.

        left:

        a bunch of nocommits I've added
        Distrib spelleck and Distrib TermsComponent do not pass tests

        Show
        Mark Miller added a comment - Still a lot to do, but after a couple days work, getting pretty close. left: a bunch of nocommits I've added Distrib spelleck and Distrib TermsComponent do not pass tests
        Hide
        Mark Miller added a comment -

        fixes the distrib spellcheck failure

        Show
        Mark Miller added a comment - fixes the distrib spellcheck failure
        Hide
        Yonik Seeley added a comment -

        I just checked in a fix to the terms component on the cloud branch

        Show
        Yonik Seeley added a comment - I just checked in a fix to the terms component on the cloud branch
        Hide
        Mark Miller added a comment -

        Sweet - all tests now pass - done to just the nocommits.

        Show
        Mark Miller added a comment - Sweet - all tests now pass - done to just the nocommits.
        Hide
        Mark Miller added a comment - - edited

        Up to trunk and clean up some tests...all tests pass but nocommits remain.

        Show
        Mark Miller added a comment - - edited Up to trunk and clean up some tests...all tests pass but nocommits remain.
        Hide
        Mark Miller added a comment -

        I had missed merging necessary changes to QueryElevation component - this patch adds that piece as well as resolves a good chunk of nocommit issues.

        Show
        Mark Miller added a comment - I had missed merging necessary changes to QueryElevation component - this patch adds that piece as well as resolves a good chunk of nocommit issues.
        Hide
        Mark Miller added a comment -

        As I wrap up the remaining work here, one issue looms: We are going to need to move Hudson to Java 6 before this can be committed.

        Show
        Mark Miller added a comment - As I wrap up the remaining work here, one issue looms: We are going to need to move Hudson to Java 6 before this can be committed.
        Hide
        Yonik Seeley added a comment -

        As I wrap up the remaining work here, one issue looms: We are going to need to move Hudson to Java 6 before this can be committed.

        In most respects, I think that would be a positive anyway. Java6 is now the primary production deployment platform for new projects (and it's new projects that will be using new lucene and/or solr). With respect to keeping Lucene Java5 compatible, we can always run the tests with Java5 before commits (that's what I did in the past when Lucene was on Java1.4)

        Show
        Yonik Seeley added a comment - As I wrap up the remaining work here, one issue looms: We are going to need to move Hudson to Java 6 before this can be committed. In most respects, I think that would be a positive anyway. Java6 is now the primary production deployment platform for new projects (and it's new projects that will be using new lucene and/or solr). With respect to keeping Lucene Java5 compatible, we can always run the tests with Java5 before commits (that's what I did in the past when Lucene was on Java1.4)
        Hide
        Mark Miller added a comment -

        Latest patch -

        2 nocommits left

        An issue with tests involving data dir setting precedence and another issue involving how we should now set a different solr.xml for tests.

        Show
        Mark Miller added a comment - Latest patch - 2 nocommits left An issue with tests involving data dir setting precedence and another issue involving how we should now set a different solr.xml for tests.
        Hide
        Mark Miller added a comment -

        Getting close - Yonik noticed I missed merging his zookeeper.jsp page, so that will be in next patch. Took care of one of the test no commits and the other is being handled here: SOLR-1897

        Uwe has graciously upgraded Hudson to Java 1.6 for this as well - so almost there.

        Show
        Mark Miller added a comment - Getting close - Yonik noticed I missed merging his zookeeper.jsp page, so that will be in next patch. Took care of one of the test no commits and the other is being handled here: SOLR-1897 Uwe has graciously upgraded Hudson to Java 1.6 for this as well - so almost there.
        Hide
        Mark Miller added a comment -

        To trunk.

        Suppose I will commit this soon - just been afraid of rmuirs wrath if I slow down the tests Generally they are not any slower for me, but the speed of some of the cloud tests could still use some tweaking - unfortunetly they rely on a couple nasty pauses - but parallel hides that well for now.

        Show
        Mark Miller added a comment - To trunk. Suppose I will commit this soon - just been afraid of rmuirs wrath if I slow down the tests Generally they are not any slower for me, but the speed of some of the cloud tests could still use some tweaking - unfortunetly they rely on a couple nasty pauses - but parallel hides that well for now.
        Hide
        Robert Muir added a comment -

        just been afraid of rmuirs wrath if I slow down the tests

        not really worried about that, but I tried your patch (put the libs on this issue in lib/ etc), and the tests don't pass (i get could not connect to zookeeper errors).

        So thats my only concern, i just think the tests shouldn't error out

        Show
        Robert Muir added a comment - just been afraid of rmuirs wrath if I slow down the tests not really worried about that, but I tried your patch (put the libs on this issue in lib/ etc), and the tests don't pass (i get could not connect to zookeeper errors). So thats my only concern, i just think the tests shouldn't error out
        Hide
        Simon Willnauer added a comment -

        Mark, are you in a hurry or should I give it a try and review - could take a day or two...

        simon

        Show
        Simon Willnauer added a comment - Mark, are you in a hurry or should I give it a try and review - could take a day or two... simon
        Hide
        Mark Miller added a comment -

        I'm not in a hurry - I actually don't plan to commit before I get back from vaca next week - just warnings.

        I havn't pulled it over to my laptop yet, so working over slow internet VNC with it.

        Show
        Mark Miller added a comment - I'm not in a hurry - I actually don't plan to commit before I get back from vaca next week - just warnings. I havn't pulled it over to my laptop yet, so working over slow internet VNC with it.
        Hide
        Hoss Man added a comment -

        Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

        http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

        Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

        A unique token for finding these 240 issues in the future: hossversioncleanup20100527

        Show
        Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
        Hide
        Mark Miller added a comment - - edited

        Just merged this to trunk and changed the tests to use a host of 127.0.01 rather than trying to get the local address (has always supported an override, trying it to see if thats part of roberts problem).

        Have to figure out a new issue first though - one of the tests fails due to a CommonsHttpSolrServer timeout - very odd, havn't seen it before.

        Show
        Mark Miller added a comment - - edited Just merged this to trunk and changed the tests to use a host of 127.0.01 rather than trying to get the local address (has always supported an override, trying it to see if thats part of roberts problem). Have to figure out a new issue first though - one of the tests fails due to a CommonsHttpSolrServer timeout - very odd, havn't seen it before.
        Hide
        Mark Miller added a comment -

        Okay, here is the latest up to date patch.

        This uses 127.0.0.1 for cloud tests - I think we def want to use localhost in the end, but I really don't know what's causing Robert's failures, so I'm doing a bit of straw grasping. These patches have passed tests on Windows XP for me, and Yonik has run older versions on Windows as well.

        Could you try the latest patch Robert and report back if you have the same timeouts?

        Show
        Mark Miller added a comment - Okay, here is the latest up to date patch. This uses 127.0.0.1 for cloud tests - I think we def want to use localhost in the end, but I really don't know what's causing Robert's failures, so I'm doing a bit of straw grasping. These patches have passed tests on Windows XP for me, and Yonik has run older versions on Windows as well. Could you try the latest patch Robert and report back if you have the same timeouts?
        Hide
        Robert Muir added a comment -

        i only got two test fails (both in ZKSolrClientTest):

           [junit] Testsuite: org.apache.solr.cloud.ZkSolrClientTest
           [junit] Testcase: testReconnect(org.apache.solr.cloud.ZkSolrClientTest):    Caused an ERROR
           [junit] Could not connect to ZooKeeper 127.0.0.1:51748/solr within 5000 ms
           [junit] java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:51748/solr within 5000 ms
           [junit]     at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124)
           [junit]     at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:121)
           [junit]     at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:85)
           [junit]     at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:65)
           [junit]     at org.apache.solr.cloud.ZkSolrClientTest.testReconnect(ZkSolrClientTest.java:80)
           [junit]
           [junit]
           [junit] Testcase: testWatchChildren(org.apache.solr.cloud.ZkSolrClientTest):        Caused an ERROR
           [junit] Could not connect to ZooKeeper 127.0.0.1:51783/solr within 5000 ms
           [junit] java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:51783/solr within 5000 ms
           [junit]     at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124)
           [junit]     at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:121)
           [junit]     at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:85)
           [junit]     at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:65)
           [junit]     at org.apache.solr.cloud.ZkSolrClientTest.testWatchChildren(ZkSolrClientTest.java:157)
           [junit]
        
        Show
        Robert Muir added a comment - i only got two test fails (both in ZKSolrClientTest): [junit] Testsuite: org.apache.solr.cloud.ZkSolrClientTest [junit] Testcase: testReconnect(org.apache.solr.cloud.ZkSolrClientTest): Caused an ERROR [junit] Could not connect to ZooKeeper 127.0.0.1:51748/solr within 5000 ms [junit] java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:51748/solr within 5000 ms [junit] at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124) [junit] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:121) [junit] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:85) [junit] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:65) [junit] at org.apache.solr.cloud.ZkSolrClientTest.testReconnect(ZkSolrClientTest.java:80) [junit] [junit] [junit] Testcase: testWatchChildren(org.apache.solr.cloud.ZkSolrClientTest): Caused an ERROR [junit] Could not connect to ZooKeeper 127.0.0.1:51783/solr within 5000 ms [junit] java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:51783/solr within 5000 ms [junit] at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124) [junit] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:121) [junit] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:85) [junit] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:65) [junit] at org.apache.solr.cloud.ZkSolrClientTest.testWatchChildren(ZkSolrClientTest.java:157) [junit]
        Hide
        Mark Miller added a comment -

        So I think I've got a work around for the test failures on windows - something to get us started. Still have to figure out why one test occasionally takes 20 min on windows though Generally it takes under a minutes, but every few runs...

        Show
        Mark Miller added a comment - So I think I've got a work around for the test failures on windows - something to get us started. Still have to figure out why one test occasionally takes 20 min on windows though Generally it takes under a minutes, but every few runs...
        Hide
        Mark Miller added a comment -

        Starting to think the long test was due to running out of disk space - Solr is not great about cleaning up its tests on Windows yet.

        Merging up to trunk has brought me some new issues to fix though. Still cranking along as I can - sorry about the wait.

        Show
        Mark Miller added a comment - Starting to think the long test was due to running out of disk space - Solr is not great about cleaning up its tests on Windows yet. Merging up to trunk has brought me some new issues to fix though. Still cranking along as I can - sorry about the wait.
        Hide
        Mark Miller added a comment -

        New patch updated to trunk. Some test improvements. Fixes an encoding issue with storing config files in zookeeper.

        Hopefully should pass the tests on Windows more often.

        Show
        Mark Miller added a comment - New patch updated to trunk. Some test improvements. Fixes an encoding issue with storing config files in zookeeper. Hopefully should pass the tests on Windows more often.
        Hide
        Mark Miller added a comment -

        I've been meaning to upload this for a couple days - here is my latest:

        I've updated to trunk.
        I've updated to the latest ZooKeeper so that tests can properly shut down and not leave large tmp dirs on Windows.
        I've cleaned up and improved some of the cloud tests.

        The main issue I see at the moment is that the shutdown is not completely clean and so some errors get pumped to std err that should be ignored - this is more of a visual issue at the moement.
        I'd like to take care of that if I can. However, first I want to see how this fairs on Roberts Windows machine. Have to figure out if the reconnect is still a problem.

        • Mark
        Show
        Mark Miller added a comment - I've been meaning to upload this for a couple days - here is my latest: I've updated to trunk. I've updated to the latest ZooKeeper so that tests can properly shut down and not leave large tmp dirs on Windows. I've cleaned up and improved some of the cloud tests. The main issue I see at the moment is that the shutdown is not completely clean and so some errors get pumped to std err that should be ignored - this is more of a visual issue at the moement. I'd like to take care of that if I can. However, first I want to see how this fairs on Roberts Windows machine. Have to figure out if the reconnect is still a problem. Mark
        Hide
        Robert Muir added a comment -

        5kb patch, man you really shrunk this thing down

        Show
        Robert Muir added a comment - 5kb patch, man you really shrunk this thing down
        Hide
        Mark Miller added a comment -

        Gah - sorry! Let me try that again.

        Show
        Mark Miller added a comment - Gah - sorry! Let me try that again.
        Hide
        Robert Muir added a comment -

        the first thing i noticed was the tmp.io.tmpdir in build.xml... but it doesnt seem like you use it.

        did you see the patch i added to SOLR-2011 ? I think its a good fix to this situation to cleanup leftover files, but ill try your patch anyway and ignore that for now

        Show
        Robert Muir added a comment - the first thing i noticed was the tmp.io.tmpdir in build.xml... but it doesnt seem like you use it. did you see the patch i added to SOLR-2011 ? I think its a good fix to this situation to cleanup leftover files, but ill try your patch anyway and ignore that for now
        Hide
        Robert Muir added a comment -

        hmm i applied your patch, but there doesnt seem to be any cloud tests in the patch... did you forget to svn add?

        Show
        Robert Muir added a comment - hmm i applied your patch, but there doesnt seem to be any cloud tests in the patch... did you forget to svn add?
        Hide
        Mark Miller added a comment -

        the first thing i noticed was the tmp.io.tmpdir in build.xml... but it doesnt seem like you use it.

        Ignore that - its just left over from my playing around trying to get those tmp files off a small drive.

        hmm i applied your patch, but there doesnt seem to be any cloud tests in the patch... did you forget to svn add?

        No, its a weird problem I ran into before as well where svn|eclipse|subclipse starts thinking the directory has been added to svn, but it hasn't. Trying to remember how I fixed that last time...annoyingly, it silently makes the patch without those files. I'll try and fix it.

        Show
        Mark Miller added a comment - the first thing i noticed was the tmp.io.tmpdir in build.xml... but it doesnt seem like you use it. Ignore that - its just left over from my playing around trying to get those tmp files off a small drive. hmm i applied your patch, but there doesnt seem to be any cloud tests in the patch... did you forget to svn add? No, its a weird problem I ran into before as well where svn|eclipse|subclipse starts thinking the directory has been added to svn, but it hasn't. Trying to remember how I fixed that last time...annoyingly, it silently makes the patch without those files. I'll try and fix it.
        Hide
        Robert Muir added a comment -

        No, its a weird problem I ran into before as well where svn|eclipse|subclipse starts thinking the directory has been added to svn, but it hasn't. Trying to remember how I fixed that last time...annoyingly, it silently makes the patch without those files. I'll try and fix it.

        I had those problems too, but they went away when i started doing svn info, svn status, svn diff for all patches instead of doing anything from eclipse

        Show
        Robert Muir added a comment - No, its a weird problem I ran into before as well where svn|eclipse|subclipse starts thinking the directory has been added to svn, but it hasn't. Trying to remember how I fixed that last time...annoyingly, it silently makes the patch without those files. I'll try and fix it. I had those problems too, but they went away when i started doing svn info, svn status, svn diff for all patches instead of doing anything from eclipse
        Hide
        Mark Miller added a comment -

        Here is that patch with the tests - it may still time out on reconnect for you, but we will see.

        Show
        Mark Miller added a comment - Here is that patch with the tests - it may still time out on reconnect for you, but we will see.
        Hide
        Sam Pullara added a comment -

        Is there a branch that is still being maintained with the patch? I am getting errors when trying to apply it to trunk.

        Show
        Sam Pullara added a comment - Is there a branch that is still being maintained with the patch? I am getting errors when trying to apply it to trunk.
        Hide
        Mark Miller added a comment -

        Hey Sam - trunk is a moving target - here is another patch updated again.

        Show
        Mark Miller added a comment - Hey Sam - trunk is a moving target - here is another patch updated again.
        Hide
        Robert Muir added a comment -

        with this version i still get the same two fails in ZkSolrClientTest... here is the latest output:

            [junit] Testsuite: org.apache.solr.cloud.ZkSolrClientTest
            [junit] Testcase: testReconnect(org.apache.solr.cloud.ZkSolrClientTest):    Caused an ERROR
            [junit] Could not connect to ZooKeeper 127.0.0.1:65048/solr within 30000 ms
            [junit] java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:65048/solr within 30000 ms
            [junit]     at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124)
            [junit]     at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:122)
            [junit]     at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:85)
            [junit]     at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:65)
            [junit]     at org.apache.solr.cloud.ZkSolrClientTest.testReconnect(ZkSolrClientTest.java:78)
            [junit]     at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:328)
            [junit]
            [junit]
            [junit] Testcase: testWatchChildren(org.apache.solr.cloud.ZkSolrClientTest):        Caused an ERROR
            [junit] Could not connect to ZooKeeper 127.0.0.1:65149/solr within 30000 ms
            [junit] java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:65149/solr within 30000 ms
            [junit]     at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124)
            [junit]     at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:122)
            [junit]     at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:85)
            [junit]     at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:65)
            [junit]     at org.apache.solr.cloud.ZkSolrClientTest.testWatchChildren(ZkSolrClientTest.java:170)
            [junit]     at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:328)
            [junit]
            [junit]
            [junit] Tests run: 4, Failures: 0, Errors: 2, Time elapsed: 71.285 sec
            [junit]
            [junit] ------------- Standard Output ---------------
            [junit] NOTE: random codec of testcase 'testReconnect' was: Standard
            [junit] NOTE: random codec of testcase 'testWatchChildren' was: Sep
            [junit] ------------- ---------------- ---------------
            [junit] ------------- Standard Error -----------------
            [junit] !!!! WARNING: best effort to remove C:\Users\rmuir\AppData\Local\Temp\org.apache.solr.cloud.ZkSolrClientTest-1280743589385\zookeeper\server1\data\version-2\log.1 FAILED !!!!!
            [junit] !!!! WARNING: best effort to remove C:\Users\rmuir\AppData\Local\Temp\org.apache.solr.cloud.ZkSolrClientTest-1280743589385\zookeeper\server1\data\version-2 FAILED !!!!!
            [junit] !!!! WARNING: best effort to remove C:\Users\rmuir\AppData\Local\Temp\org.apache.solr.cloud.ZkSolrClientTest-1280743589385\zookeeper\server1\data FAILED !!!!!
            [junit] !!!! WARNING: best effort to remove C:\Users\rmuir\AppData\Local\Temp\org.apache.solr.cloud.ZkSolrClientTest-1280743589385\zookeeper\server1 FAILED !!!!!
            [junit] !!!! WARNING: best effort to remove C:\Users\rmuir\AppData\Local\Temp\org.apache.solr.cloud.ZkSolrClientTest-1280743589385\zookeeper FAILED !!!!!
            [junit] !!!! WARNING: best effort to remove C:\Users\rmuir\AppData\Local\Temp\org.apache.solr.cloud.ZkSolrClientTest-1280743589385 FAILED !!!!!
            [junit] ------------- ---------------- ---------------
        
        Show
        Robert Muir added a comment - with this version i still get the same two fails in ZkSolrClientTest... here is the latest output: [junit] Testsuite: org.apache.solr.cloud.ZkSolrClientTest [junit] Testcase: testReconnect(org.apache.solr.cloud.ZkSolrClientTest): Caused an ERROR [junit] Could not connect to ZooKeeper 127.0.0.1:65048/solr within 30000 ms [junit] java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:65048/solr within 30000 ms [junit] at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124) [junit] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:122) [junit] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:85) [junit] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:65) [junit] at org.apache.solr.cloud.ZkSolrClientTest.testReconnect(ZkSolrClientTest.java:78) [junit] at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:328) [junit] [junit] [junit] Testcase: testWatchChildren(org.apache.solr.cloud.ZkSolrClientTest): Caused an ERROR [junit] Could not connect to ZooKeeper 127.0.0.1:65149/solr within 30000 ms [junit] java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:65149/solr within 30000 ms [junit] at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124) [junit] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:122) [junit] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:85) [junit] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:65) [junit] at org.apache.solr.cloud.ZkSolrClientTest.testWatchChildren(ZkSolrClientTest.java:170) [junit] at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:328) [junit] [junit] [junit] Tests run: 4, Failures: 0, Errors: 2, Time elapsed: 71.285 sec [junit] [junit] ------------- Standard Output --------------- [junit] NOTE: random codec of testcase 'testReconnect' was: Standard [junit] NOTE: random codec of testcase 'testWatchChildren' was: Sep [junit] ------------- ---------------- --------------- [junit] ------------- Standard Error ----------------- [junit] !!!! WARNING: best effort to remove C:\Users\rmuir\AppData\Local\Temp\org.apache.solr.cloud.ZkSolrClientTest-1280743589385\zookeeper\server1\data\version-2\log.1 FAILED !!!!! [junit] !!!! WARNING: best effort to remove C:\Users\rmuir\AppData\Local\Temp\org.apache.solr.cloud.ZkSolrClientTest-1280743589385\zookeeper\server1\data\version-2 FAILED !!!!! [junit] !!!! WARNING: best effort to remove C:\Users\rmuir\AppData\Local\Temp\org.apache.solr.cloud.ZkSolrClientTest-1280743589385\zookeeper\server1\data FAILED !!!!! [junit] !!!! WARNING: best effort to remove C:\Users\rmuir\AppData\Local\Temp\org.apache.solr.cloud.ZkSolrClientTest-1280743589385\zookeeper\server1 FAILED !!!!! [junit] !!!! WARNING: best effort to remove C:\Users\rmuir\AppData\Local\Temp\org.apache.solr.cloud.ZkSolrClientTest-1280743589385\zookeeper FAILED !!!!! [junit] !!!! WARNING: best effort to remove C:\Users\rmuir\AppData\Local\Temp\org.apache.solr.cloud.ZkSolrClientTest-1280743589385 FAILED !!!!! [junit] ------------- ---------------- ---------------
        Hide
        Mark Miller added a comment -

        I have not been able to duplicate this after getting cloud set up on Windows Vista - all tests pass for me - though I did find another one or two that occasionally fail on Vista and I have strengthened them (added or lengthened retries). Robert says his issue happens on every test run though. Anyone out there able to try these tests on Windows Vista and/or Windows 7 to help out tracking this down?

        Show
        Mark Miller added a comment - I have not been able to duplicate this after getting cloud set up on Windows Vista - all tests pass for me - though I did find another one or two that occasionally fail on Vista and I have strengthened them (added or lengthened retries). Robert says his issue happens on every test run though. Anyone out there able to try these tests on Windows Vista and/or Windows 7 to help out tracking this down?
        Hide
        Yonik Seeley added a comment -

        If you can upload a patch w/o the funny double-spaced lines in some of the files,
        I'll try it out (I haven't been successful applying the patch with command line patch or tortisesvn)

        Show
        Yonik Seeley added a comment - If you can upload a patch w/o the funny double-spaced lines in some of the files, I'll try it out (I haven't been successful applying the patch with command line patch or tortisesvn)
        Hide
        Mark Miller added a comment -

        I've got a new patch for trunk - its updated to trunk and has a fix for a problem with the zk client port. Unfortunately, I've still go the weird double lines for new files in this patch - so I have to figure out why that is happening.

        Show
        Mark Miller added a comment - I've got a new patch for trunk - its updated to trunk and has a fix for a problem with the zk client port. Unfortunately, I've still go the weird double lines for new files in this patch - so I have to figure out why that is happening.
        Hide
        Mark Miller added a comment -

        Latest patch - I think the funny extra line spaces are fixed - appears to be something weird TextWrangler on my mac was doing to the patch file. Hopefully it's good now.

        Show
        Mark Miller added a comment - Latest patch - I think the funny extra line spaces are fixed - appears to be something weird TextWrangler on my mac was doing to the patch file. Hopefully it's good now.
        Hide
        Yonik Seeley added a comment -

        I'm getting the same errors as Robert (after commenting out the bad host, which my DNS provider returns a fake entry for) on my Win7-64 box. My Ubuntu box passes fine though.

        Attaching the test output.

        Show
        Yonik Seeley added a comment - I'm getting the same errors as Robert (after commenting out the bad host, which my DNS provider returns a fake entry for) on my Win7-64 box. My Ubuntu box passes fine though. Attaching the test output.
        Hide
        Kevin Dana added a comment - - edited

        In ZkStateReader, the Threads created by updateCloudExecutor are preventing a clean shutdown under Tomcat.
        To correct this, the following code changes the declaration of updateCloudExecutor to use a ThreadFactory to set the Threads to "daemon":

         
          private static class ZKTF implements ThreadFactory {
            private static ThreadGroup tg = new ThreadGroup("ZkStateReader");
            @Override
            public Thread newThread(Runnable r) {
              Thread td = new Thread(tg, r);
              td.setDaemon(true);
              return td;
            }
          }
          private ScheduledExecutorService updateCloudExecutor = Executors.newScheduledThreadPool(1, new ZKTF());
        
        Show
        Kevin Dana added a comment - - edited In ZkStateReader, the Threads created by updateCloudExecutor are preventing a clean shutdown under Tomcat. To correct this, the following code changes the declaration of updateCloudExecutor to use a ThreadFactory to set the Threads to "daemon": private static class ZKTF implements ThreadFactory { private static ThreadGroup tg = new ThreadGroup ( "ZkStateReader" ); @Override public Thread newThread( Runnable r) { Thread td = new Thread (tg, r); td.setDaemon( true ); return td; } } private ScheduledExecutorService updateCloudExecutor = Executors.newScheduledThreadPool(1, new ZKTF());
        Hide
        Mark Miller added a comment -

        This has nothing to address the windows test issue, but a new patch attached:

        • updates to trunk: r1021515
        • fixes a null pointer bug that was introduced on the last zk jars upgrade in the built in solr zk server
        • incorporates daemon thread fix above
        • improves the wait code for one of the tests just a bit
        Show
        Mark Miller added a comment - This has nothing to address the windows test issue, but a new patch attached: updates to trunk: r1021515 fixes a null pointer bug that was introduced on the last zk jars upgrade in the built in solr zk server incorporates daemon thread fix above improves the wait code for one of the tests just a bit
        Hide
        Mark Miller added a comment -

        Thanks for the contribution Kevin!

        Show
        Mark Miller added a comment - Thanks for the contribution Kevin!
        Hide
        Mark Miller added a comment -

        Tests should now pass on all versions of windows (knock on wood) with this patch thanks to Robert - he took a closer look and saw that the test was using a zk connect timeout of 15ms in two places - much, much too low. Changing to the correct default timeout of 10-15s that is used elsewhere appears to have fixed the issue.

        Show
        Mark Miller added a comment - Tests should now pass on all versions of windows (knock on wood) with this patch thanks to Robert - he took a closer look and saw that the test was using a zk connect timeout of 15ms in two places - much, much too low. Changing to the correct default timeout of 10-15s that is used elsewhere appears to have fixed the issue.
        Hide
        Mark Miller added a comment -

        Okay - I'd still like to push these tests to be quicker - but I'd like to commit this soon if there are no objections - getting this in trunk is going to make things a lot easier for a few people (including me - as fun as merging up to trunk always is) - and now that I know a couple people are using it (at least one in production), I feel pretty good about getting this in soon.

        This is our base from which I hope a lot of further cool cloud stuff comes.

        Show
        Mark Miller added a comment - Okay - I'd still like to push these tests to be quicker - but I'd like to commit this soon if there are no objections - getting this in trunk is going to make things a lot easier for a few people (including me - as fun as merging up to trunk always is) - and now that I know a couple people are using it (at least one in production), I feel pretty good about getting this in soon. This is our base from which I hope a lot of further cool cloud stuff comes.
        Hide
        Robert Muir added a comment -

        +1 to commit, i have no problems since the timeout has changed.

        Additionally the tests don't cause a significant slowdown on my computer, and there is an issue open
        already to speed them up. I think its better to have the code in trunk at this point so we can spend
        time actually improving it and not merging.

        Show
        Robert Muir added a comment - +1 to commit, i have no problems since the timeout has changed. Additionally the tests don't cause a significant slowdown on my computer, and there is an issue open already to speed them up. I think its better to have the code in trunk at this point so we can spend time actually improving it and not merging.
        Hide
        Mark Miller added a comment -

        committed r1022188

        Show
        Mark Miller added a comment - committed r1022188
        Hide
        Grant Ingersoll added a comment -

        I don't understand why this puts all the check shards logic into the QueryComponent. We have many paths that don't go through the QueryComponent that could use this. Seems like we should either make a something like a ShardsComponent, or abstract it up to the RequestHandlerBase, so that everyone can take advantage of it if they want to.

        Show
        Grant Ingersoll added a comment - I don't understand why this puts all the check shards logic into the QueryComponent. We have many paths that don't go through the QueryComponent that could use this. Seems like we should either make a something like a ShardsComponent, or abstract it up to the RequestHandlerBase, so that everyone can take advantage of it if they want to.
        Hide
        Noble Paul added a comment -

        How can we make the logic for identifying the shards pluggable? if I have a per user data stored in a given shard, the search should be performed only there. Is there an issue to track this or shall I open one?

        Show
        Noble Paul added a comment - How can we make the logic for identifying the shards pluggable? if I have a per user data stored in a given shard, the search should be performed only there. Is there an issue to track this or shall I open one?
        Hide
        Mark Miller added a comment -

        If I remember right (been a long time since I talked about it with Jon), I think loggly had to do some small custom hack for this type of thing as well - no issue that I know of - lets make a new issue.

        Show
        Mark Miller added a comment - If I remember right (been a long time since I talked about it with Jon), I think loggly had to do some small custom hack for this type of thing as well - no issue that I know of - lets make a new issue.

          People

          • Assignee:
            Mark Miller
            Reporter:
            Mark Miller
          • Votes:
            3 Vote for this issue
            Watchers:
            20 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development