Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.0 beta 2
    • Component/s: Core
    • Labels:

      Description

      As mentioned on CASSANDRA-4127, for upgrades we need a 'shuffle' command to split up the contiguous ranges.

      List discussion: http://thread.gmane.org/gmane.comp.db.cassandra.devel/6799

      Edit0: Linked in mailing list discussion.
      Edit1: Linked in patch information.
      Edit2: Updated patch links.

      Patches

      Compare Raw diff Description
      060_shuffle_utility 060_shuffle_utility.patch shuffle util to randomly remap ranges

      Note: These are branches managed with TopGit. If you are applying the patch output manually, you will either need to filter the TopGit metadata files (i.e. wget -O - <url> | filterdiff -x*.topdeps -x*.topmsg | patch -p1), or remove them afterward (rm .topmsg .topdeps).

        Issue Links

          Activity

          Hide
          Jonathan Ellis added a comment -

          Eric has some notes up at http://wiki.apache.org/cassandra/VirtualNodes/Balance on implementation but a wiki is a lousy discussion medium so I'll post here:

          shuffle should operate on the entire cluster, at least by default. Shuffling node at a time means that for each node i for i in 0..N-1 (where N is the cluster size), i/N of the ranges shuffled will, on average, have been shuffled at least once already. So it's substantially less efficient than shuffling once, then assigning the vnodes out in one cluster-wide pass.

          Show
          Jonathan Ellis added a comment - Eric has some notes up at http://wiki.apache.org/cassandra/VirtualNodes/Balance on implementation but a wiki is a lousy discussion medium so I'll post here: shuffle should operate on the entire cluster, at least by default. Shuffling node at a time means that for each node i for i in 0..N-1 (where N is the cluster size), i/N of the ranges shuffled will, on average, have been shuffled at least once already. So it's substantially less efficient than shuffling once, then assigning the vnodes out in one cluster-wide pass.
          Hide
          Eric Evans added a comment -

          Eric has some notes up at http://wiki.apache.org/cassandra/VirtualNodes/Balance on implementation but a wiki is a lousy discussion medium so I'll post here:

          It is, but so is Jira.

          I'm not trying to start a discussion on the wiki, I just wanted to get down the general requirements and brain-storm a bit so that any subsequent discussion (if needed) might be more productive.

          shuffle should operate on the entire cluster, at least by default. Shuffling node at a time means that for each node i for i in 0..N-1 (where N is the cluster size), i/N of the ranges shuffled will, on average, have been shuffled at least once already. So it's substantially less efficient than shuffling once, then assigning the vnodes out in one cluster-wide pass.

          Yeah, this is exactly the sort of thing I was trying to capture (and I've added it).

          Show
          Eric Evans added a comment - Eric has some notes up at http://wiki.apache.org/cassandra/VirtualNodes/Balance on implementation but a wiki is a lousy discussion medium so I'll post here: It is, but so is Jira. I'm not trying to start a discussion on the wiki, I just wanted to get down the general requirements and brain-storm a bit so that any subsequent discussion (if needed) might be more productive. shuffle should operate on the entire cluster, at least by default. Shuffling node at a time means that for each node i for i in 0..N-1 (where N is the cluster size), i/N of the ranges shuffled will, on average, have been shuffled at least once already. So it's substantially less efficient than shuffling once, then assigning the vnodes out in one cluster-wide pass. Yeah, this is exactly the sort of thing I was trying to capture (and I've added it).
          Hide
          Jonathan Ellis added a comment -

          Thanks!

          Show
          Jonathan Ellis added a comment - Thanks!
          Hide
          Eric Evans added a comment -

          Once CASSANDRA-4664 is resolved, this should be ready for review.

          Show
          Eric Evans added a comment - Once CASSANDRA-4664 is resolved, this should be ready for review.
          Hide
          Eric Evans added a comment -

          CASSANDRA-4664 is resolved, and the shuffle patch is freshly rebased (as of now, anyway); Review can resume at any time.

          Show
          Eric Evans added a comment - CASSANDRA-4664 is resolved, and the shuffle patch is freshly rebased (as of now, anyway); Review can resume at any time.
          Hide
          Brandon Williams added a comment -

          Hmm, I'm seeing this when trying to create:

          Exception in thread "main" java.lang.RuntimeException: InvalidRequestException(why:no keyspace has been specified)
                  at org.apache.cassandra.tools.Shuffle.executeCqlQuery(Shuffle.java:532)
                  at org.apache.cassandra.tools.Shuffle.shuffle(Shuffle.java:375)
                  at org.apache.cassandra.tools.Shuffle.main(Shuffle.java:693)
          Caused by: InvalidRequestException(why:no keyspace has been specified)
                  at org.apache.cassandra.thrift.Cassandra$execute_cql_query_result.read(Cassandra.java:36625)
                  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
                  at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql_query(Cassandra.java:1525)
                  at org.apache.cassandra.thrift.Cassandra$Client.execute_cql_query(Cassandra.java:1511)
                  at org.apache.cassandra.tools.CassandraClient.execute_cql_query(Shuffle.java:748)
                  at org.apache.cassandra.tools.Shuffle.executeCqlQuery(Shuffle.java:518)
                  ... 2 more
          
          Show
          Brandon Williams added a comment - Hmm, I'm seeing this when trying to create: Exception in thread "main" java.lang.RuntimeException: InvalidRequestException(why:no keyspace has been specified) at org.apache.cassandra.tools.Shuffle.executeCqlQuery(Shuffle.java:532) at org.apache.cassandra.tools.Shuffle.shuffle(Shuffle.java:375) at org.apache.cassandra.tools.Shuffle.main(Shuffle.java:693) Caused by: InvalidRequestException(why:no keyspace has been specified) at org.apache.cassandra.thrift.Cassandra$execute_cql_query_result.read(Cassandra.java:36625) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql_query(Cassandra.java:1525) at org.apache.cassandra.thrift.Cassandra$Client.execute_cql_query(Cassandra.java:1511) at org.apache.cassandra.tools.CassandraClient.execute_cql_query(Shuffle.java:748) at org.apache.cassandra.tools.Shuffle.executeCqlQuery(Shuffle.java:518) ... 2 more
          Hide
          Eric Evans added a comment -

          This has been fixed

          Show
          Eric Evans added a comment - This has been fixed
          Hide
          Brandon Williams added a comment -

          +1

          Show
          Brandon Williams added a comment - +1
          Hide
          Eric Evans added a comment -

          Awesome, thanks for all the help; Committed

          Show
          Eric Evans added a comment - Awesome, thanks for all the help; Committed

            People

            • Assignee:
              Eric Evans
              Reporter:
              Brandon Williams
              Reviewer:
              Brandon Williams
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development