Solr
  1. Solr
  2. SOLR-3932

SolrCmdDistributorTest either takes 3 seconds or 3 minutes.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1, 6.0
    • Component/s: SolrCloud, Tests
    • Labels:
      None

      Description

      I've looked into this a little in the past, but had not come to a conclusion. It really bugs me because it doubles my test run time from 3 minutes to 6 minutes when it happens.

      I've been looking into it today and I think I've tracked the problem down to mostly test bugs. One real bug around distrib commit ordering was also uncovered.

      1. stack.txt
        90 kB
        Mark Miller

        Issue Links

          Activity

          Hide
          Mark Miller added a comment -

          Hmm...not sure if the fixes around the test I made just made this less likely to trigger or what...but once I also did a fix for SOLR-3933 I started seeing the long hangs again.

          It seems mainly taking a long time in socket read calls...

          A sample stack trace attached.

          Show
          Mark Miller added a comment - Hmm...not sure if the fixes around the test I made just made this less likely to trigger or what...but once I also did a fix for SOLR-3933 I started seeing the long hangs again. It seems mainly taking a long time in socket read calls... A sample stack trace attached.
          Hide
          Mark Miller added a comment -

          I'm not sure what the heck is on the other end of these socket read hangs - the stack trace indicates it could only be jetty?

          I've tried a variety of settings in both jetty and httpclient to how this changes - I did find that if i drop max connections per host in http client to 2 or 3, I don't seem to see the issue.

          Also, only allowing like 2 httpclients to fire at a time (rather than the default of 8 per host), seemed to make the issue go away or happen much less frequently.

          Both options would seem to simply decrease the load...

          I'm not sure what is up yet - I've also tried upgrading httpclient/components - but no luck.

          Show
          Mark Miller added a comment - I'm not sure what the heck is on the other end of these socket read hangs - the stack trace indicates it could only be jetty? I've tried a variety of settings in both jetty and httpclient to how this changes - I did find that if i drop max connections per host in http client to 2 or 3, I don't seem to see the issue. Also, only allowing like 2 httpclients to fire at a time (rather than the default of 8 per host), seemed to make the issue go away or happen much less frequently. Both options would seem to simply decrease the load... I'm not sure what is up yet - I've also tried upgrading httpclient/components - but no luck.
          Hide
          Mark Miller added a comment -

          So doing other things to relieve the stress also hide the issue - eg raising our update buffer size...

          I pinged yonik about this to see if he had any ideas and he brought up that I should try the NIO jetty Connector impl. For various reasons we don't currently default to it.

          That actually seems to work much better. We have advanced a couple Jetty versions since the Connector impl was chosen, so perhaps it's time to come back to this choice.

          Show
          Mark Miller added a comment - So doing other things to relieve the stress also hide the issue - eg raising our update buffer size... I pinged yonik about this to see if he had any ideas and he brought up that I should try the NIO jetty Connector impl. For various reasons we don't currently default to it. That actually seems to work much better. We have advanced a couple Jetty versions since the Connector impl was chosen, so perhaps it's time to come back to this choice.
          Hide
          Mark Miller added a comment -

          There is still more to investigate, but I created SOLR-3935 to track looking into changing our default Connector impl.

          Show
          Mark Miller added a comment - There is still more to investigate, but I created SOLR-3935 to track looking into changing our default Connector impl.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1402362

          SOLR-3933: Distributed commits are not guaranteed to be ordered within a request.

          SOLR-3939: An empty or just replicated index cannot become the leader of a shard after a leader goes down.

          SOLR-3971: A collection that is created with numShards=1 turns into a numShards=2 collection after starting up a second core and not specifying numShards.

          SOLR-3932: SolrCmdDistributorTest either takes 3 seconds or 3 minutes.

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1402362 SOLR-3933 : Distributed commits are not guaranteed to be ordered within a request. SOLR-3939 : An empty or just replicated index cannot become the leader of a shard after a leader goes down. SOLR-3971 : A collection that is created with numShards=1 turns into a numShards=2 collection after starting up a second core and not specifying numShards. SOLR-3932 : SolrCmdDistributorTest either takes 3 seconds or 3 minutes.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1402361

          SOLR-3933: Distributed commits are not guaranteed to be ordered within a request.

          SOLR-3939: An empty or just replicated index cannot become the leader of a shard after a leader goes down.

          SOLR-3971: A collection that is created with numShards=1 turns into a numShards=2 collection after starting up a second core and not specifying numShards.

          SOLR-3932: SolrCmdDistributorTest either takes 3 seconds or 3 minutes.

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1402361 SOLR-3933 : Distributed commits are not guaranteed to be ordered within a request. SOLR-3939 : An empty or just replicated index cannot become the leader of a shard after a leader goes down. SOLR-3971 : A collection that is created with numShards=1 turns into a numShards=2 collection after starting up a second core and not specifying numShards. SOLR-3932 : SolrCmdDistributorTest either takes 3 seconds or 3 minutes.

            People

            • Assignee:
              Mark Miller
              Reporter:
              Mark Miller
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development