Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 0.5
    • Component/s: None
    • Labels:
      None

      Description

      We need to be able to spread load evenly across a cluster to mitigate keys not being uniformly distributed as well as heterogeneous nodes in a cluster. The former is particularly likely to be a problem when using the OrderPreservingPartitioner, since the keys are not randomized by a hash function.

      Avinash suggested three papers on load balancing in this thread: http://groups.google.com/group/cassandra-dev/msg/b3d67acf35801c41

      Of these, the useful ones are
      http://www.iptps.org/papers-2004/karger-load-balance.pdf (Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems by David R. Karger and Matthias Ruhl)
      http://iptps03.cs.berkeley.edu/final-papers/load_balancing.ps (Load Balancing in Structured P2P Systems by Ananth Rao et al)

      The third,
      http://iptps03.cs.berkeley.edu/final-papers/simple_load_balancing.ps (Simple Load Balancing for Distributed Hash Tables by John Byers et al) is not applicable to Cassandra's design. ("First, we suggest the direct application of the 'power of two choices' paradigm, whereby an item is stored at the less loaded of two (or more) random alternatives. We then consider how associating a small constant number of hash values with a key can naturally be extended to support other load balancing strategies.")

      1. 192.patch
        7 kB
        Jonathan Ellis

        Issue Links

          Activity

          Aleksey Yeschenko made changes -
          Component/s Core [ 12312978 ]
          Gavin made changes -
          Workflow patch-available, re-open possible [ 12748938 ] reopen-resolved, no closed status, patch-avail, testing [ 12756741 ]
          Gavin made changes -
          Workflow no-reopen-closed, patch-avail [ 12464044 ] patch-available, re-open possible [ 12748938 ]
          Jonathan Ellis made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Jonathan Ellis made changes -
          Attachment 192.patch [ 12424791 ]
          Jonathan Ellis made changes -
          Attachment 192.patch [ 12424859 ]
          Jonathan Ellis made changes -
          Attachment 192.patch [ 12424757 ]
          Jonathan Ellis made changes -
          Attachment 192.patch [ 12424791 ]
          Jonathan Ellis made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Jonathan Ellis made changes -
          Attachment 192.patch [ 12424757 ]
          Jonathan Ellis made changes -
          Assignee Jonathan Ellis [ jbellis ]
          Jonathan Ellis made changes -
          Link This issue is blocked by CASSANDRA-435 [ CASSANDRA-435 ]
          Sandeep Tata made changes -
          Assignee Sandeep Tata [ sandeep_tata ]
          Sandeep Tata made changes -
          Assignee Sandeep Tata [ sandeep_tata ]
          Jonathan Ellis made changes -
          Component/s Core [ 12312978 ]
          Stu Hood made changes -
          Link This issue is blocked by CASSANDRA-242 [ CASSANDRA-242 ]
          Jonathan Ellis made changes -
          Fix Version/s 0.5 [ 12314040 ]
          Fix Version/s 0.4 [ 12313862 ]
          Jonathan Ellis made changes -
          Description We need to be able to spread load evenly across a cluster to mitigate keys not being uniformly distributed as well as heterogeneous nodes in a cluster. The former is particularly likely to be a problem when using the OrderPreservingPartitioner, since the keys are not randomized by a hash function.

          Avinash suggested three papers on load balancing in this thread: http://groups.google.com/group/cassandra-dev/msg/b3d67acf35801c41

          Of these, the useful ones are
           http://www.iptps.org/papers-2004/karger-load-balance.pdf (Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems by David R. Karger and Matthias Ruhl)
           http://iptps03.cs.berkeley.edu/final-papers/load_balancing.ps (Load Balancing in Structured P2P Systems by Ananth Rao et al)

          The third,
          http://iptps03.cs.berkeley.edu/final-papers/simple_load_balancing.ps (Simple Load Balancing for Distributed Hash Tables by John Byers et al) is not applicable to Cassandra's design. ('First, we suggest the direct application of the lsquolsquopower of two choicesrsquorsquo paradigm, whereby an item is stored at the less loaded of two (or more) random alternatives. We then consider how associating a small constant number of hash values with a key can naturally be extended to support other load balancing strategies.')
          We need to be able to spread load evenly across a cluster to mitigate keys not being uniformly distributed as well as heterogeneous nodes in a cluster. The former is particularly likely to be a problem when using the OrderPreservingPartitioner, since the keys are not randomized by a hash function.

          Avinash suggested three papers on load balancing in this thread: http://groups.google.com/group/cassandra-dev/msg/b3d67acf35801c41

          Of these, the useful ones are
           http://www.iptps.org/papers-2004/karger-load-balance.pdf (Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems by David R. Karger and Matthias Ruhl)
           http://iptps03.cs.berkeley.edu/final-papers/load_balancing.ps (Load Balancing in Structured P2P Systems by Ananth Rao et al)

          The third,
          http://iptps03.cs.berkeley.edu/final-papers/simple_load_balancing.ps (Simple Load Balancing for Distributed Hash Tables by John Byers et al) is not applicable to Cassandra's design. ("First, we suggest the direct application of the 'power of two choices' paradigm, whereby an item is stored at the less loaded of two (or more) random alternatives. We then consider how associating a small constant number of hash values with a key can naturally be extended to support other load balancing strategies.")
          Jonathan Ellis made changes -
          Field Original Value New Value
          Description We need to be able to spread load evenly across a cluster to mitigate keys not being uniformly distributed as well as heterogeneous nodes in a cluster. The former is particularly likely to be a problem when using the OrderPreservingPartitioner, since the keys are not randomized by a hash function.

          Avinash suggested three papers on load balancing in this thread: http://groups.google.com/group/cassandra-dev/msg/b3d67acf35801c41

          Of these, the useful ones are
           http://www.iptps.org/papers-2004/karger-load-balance.pdf (Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems by David R. Karger and Matthias Ruhl)
           http://iptps03.cs.berkeley.edu/final-papers/load_balancing.ps (Load Balancing in Structured P2P Systems by Ananth Rao et al)

          The third,
          http://iptps03.cs.berkeley.edu/final-papers/simple_load_balancing.ps (Simple Load Balancing for Distributed Hash Tables by John Byers et al) is not applicable to Cassandra's design. ("First, we suggest the direct application of the lsquolsquopower of two choicesrsquorsquo paradigm, whereby an item is stored at the less loaded of two (or more) random alternatives. We then consider how associating a small constant number of hash values with a key can naturally be extended to support other load balancing strategies.")
          We need to be able to spread load evenly across a cluster to mitigate keys not being uniformly distributed as well as heterogeneous nodes in a cluster. The former is particularly likely to be a problem when using the OrderPreservingPartitioner, since the keys are not randomized by a hash function.

          Avinash suggested three papers on load balancing in this thread: http://groups.google.com/group/cassandra-dev/msg/b3d67acf35801c41

          Of these, the useful ones are
           http://www.iptps.org/papers-2004/karger-load-balance.pdf (Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems by David R. Karger and Matthias Ruhl)
           http://iptps03.cs.berkeley.edu/final-papers/load_balancing.ps (Load Balancing in Structured P2P Systems by Ananth Rao et al)

          The third,
          http://iptps03.cs.berkeley.edu/final-papers/simple_load_balancing.ps (Simple Load Balancing for Distributed Hash Tables by John Byers et al) is not applicable to Cassandra's design. ('First, we suggest the direct application of the lsquolsquopower of two choicesrsquorsquo paradigm, whereby an item is stored at the less loaded of two (or more) random alternatives. We then consider how associating a small constant number of hash values with a key can naturally be extended to support other load balancing strategies.')
          Jonathan Ellis created issue -

            People

            • Assignee:
              Jonathan Ellis
              Reporter:
              Jonathan Ellis
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development