Solr
  1. Solr
  2. SOLR-7493

Requests aren't distributed evenly if the collection isn't present locally

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.0
    • Fix Version/s: 5.2.1, 5.3, 6.0
    • Component/s: SolrCloud
    • Labels:
      None

      Description

      I had a SolrCloud cluster where every node is behind a simple round-robin load balancer.
      This cluster had two collections (A, B), and the slices of each were partitioned such that one collection (A) used two thirds of the nodes, and the other collection (B) used the remaining third of the nodes.

      I observed that every request for collection B that the load balancer sent to a node with (only) slices for collection A got proxied to one specific node hosting a slice for collection B. This node started running pretty hot, for obvious reasons.

      This meant that one specific node was handling the fan-out for slightly more than two-thirds of the requests against collection B.

      1. SOLR-7493.patch
        7 kB
        Shalin Shekhar Mangar
      2. SOLR-7493.patch
        2 kB
        Jeff Wartes

        Activity

        Hide
        Jeff Wartes added a comment -

        It looks like this happens because SolrDispatchFilter's getRemoteCoreURL eventually takes the first viable entry from a HashMap.values list of cores.

        HashMap.values ordering is always the same, if you load the HashMap with the same data in the same order. So if the list from ZK is presented in the same order on every node, every node will use the same ordering on every request.

        There might be a better solution, but this patch would randomize that ordering per-request.
        My environment is a bit messed up at the moment, so I haven't done much more than verify this compiles.

        Show
        Jeff Wartes added a comment - It looks like this happens because SolrDispatchFilter's getRemoteCoreURL eventually takes the first viable entry from a HashMap.values list of cores. HashMap.values ordering is always the same, if you load the HashMap with the same data in the same order. So if the list from ZK is presented in the same order on every node, every node will use the same ordering on every request. There might be a better solution, but this patch would randomize that ordering per-request. My environment is a bit messed up at the moment, so I haven't done much more than verify this compiles.
        Hide
        Mark Miller added a comment -

        +1, thanks Jeff!

        Show
        Mark Miller added a comment - +1, thanks Jeff!
        Hide
        Shalin Shekhar Mangar added a comment -
        • Uses a random seeded with tests.seed of System.currentTimeMillis for shuffling
        • Added a simple test which creates creates 3 jettys, 2 collections A, B such that A has replicas on node1, node2 and collection B has replica on node3. The test fires 10 search requests to node3 intended for collection A and asserts that all requests do not go to the same replica of collection A.
        Show
        Shalin Shekhar Mangar added a comment - Uses a random seeded with tests.seed of System.currentTimeMillis for shuffling Added a simple test which creates creates 3 jettys, 2 collections A, B such that A has replicas on node1, node2 and collection B has replica on node3. The test fires 10 search requests to node3 intended for collection A and asserts that all requests do not go to the same replica of collection A.
        Hide
        ASF subversion and git services added a comment -

        Commit 1683946 from shalin@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1683946 ]

        SOLR-7493: Requests aren't distributed evenly if the collection isn't present locally

        Show
        ASF subversion and git services added a comment - Commit 1683946 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1683946 ] SOLR-7493 : Requests aren't distributed evenly if the collection isn't present locally
        Hide
        ASF subversion and git services added a comment -

        Commit 1683948 from shalin@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1683948 ]

        SOLR-7493: Initialize random correctly

        Show
        ASF subversion and git services added a comment - Commit 1683948 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1683948 ] SOLR-7493 : Initialize random correctly
        Hide
        ASF subversion and git services added a comment -

        Commit 1683950 from shalin@apache.org in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1683950 ]

        SOLR-7493: Requests aren't distributed evenly if the collection isn't present locally. Merges r1683946 and r1683948 from trunk.

        Show
        ASF subversion and git services added a comment - Commit 1683950 from shalin@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1683950 ] SOLR-7493 : Requests aren't distributed evenly if the collection isn't present locally. Merges r1683946 and r1683948 from trunk.
        Hide
        Shalin Shekhar Mangar added a comment -

        Thanks Jeff!

        Show
        Shalin Shekhar Mangar added a comment - Thanks Jeff!
        Hide
        Shalin Shekhar Mangar added a comment -

        Reopening to backport to 5.2.1

        Show
        Shalin Shekhar Mangar added a comment - Reopening to backport to 5.2.1
        Hide
        ASF subversion and git services added a comment -

        Commit 1684674 from shalin@apache.org in branch 'dev/branches/lucene_solr_5_2'
        [ https://svn.apache.org/r1684674 ]

        SOLR-7493: Requests aren't distributed evenly if the collection isn't present locally. Merging r1683950 from branch_5x

        Show
        ASF subversion and git services added a comment - Commit 1684674 from shalin@apache.org in branch 'dev/branches/lucene_solr_5_2' [ https://svn.apache.org/r1684674 ] SOLR-7493 : Requests aren't distributed evenly if the collection isn't present locally. Merging r1683950 from branch_5x
        Hide
        ASF subversion and git services added a comment -

        Commit 1684675 from shalin@apache.org in branch 'dev/branches/lucene_solr_5_2'
        [ https://svn.apache.org/r1684675 ]

        SOLR-7493: Fix compile issue after backport

        Show
        ASF subversion and git services added a comment - Commit 1684675 from shalin@apache.org in branch 'dev/branches/lucene_solr_5_2' [ https://svn.apache.org/r1684675 ] SOLR-7493 : Fix compile issue after backport

          People

          • Assignee:
            Shalin Shekhar Mangar
            Reporter:
            Jeff Wartes
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development