Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-3137

Implement wrapping intersections for ConfigHelper's InputKeyRange

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Fix Version/s: 0.8.7, 1.0.0
    • Component/s: None
    • Labels:
      None

      Description

      Before there was no support for multiple intersections between the split's range and the job's configured range.
      After CASSANDRA-3108 it is now possible.

        Activity

        Hide
        michaelsembwever mck added a comment -

        Haven't tested this (with real data) yet.

        But the code looks pretty simple and straight forward here...

        Show
        michaelsembwever mck added a comment - Haven't tested this (with real data) yet. But the code looks pretty simple and straight forward here...
        Hide
        michaelsembwever mck added a comment -

        new patch w/ better formatting

        Show
        michaelsembwever mck added a comment - new patch w/ better formatting
        Hide
        jbellis Jonathan Ellis added a comment -

        Wrapping (key) ranges was the source of a ton of bugs in the 0.6 era. Thus I'm leery of adding wrapped range support to Hadoop just "because we can." Is there a compelling motivation otherwise?

        Show
        jbellis Jonathan Ellis added a comment - Wrapping (key) ranges was the source of a ton of bugs in the 0.6 era. Thus I'm leery of adding wrapped range support to Hadoop just "because we can." Is there a compelling motivation otherwise?
        Hide
        michaelsembwever mck added a comment - - edited

        Indeed. I could be using this asap.

        The use case is...
        We're using a ByteOrderedPartition because we run incremental hadoop jobs over one of our column families where "events" initially come in. This cf has RF=1 and time-based UUID keys that are manipulated so that their byte ordering are time ordered. (the timestamp put up front). Each column has ttl of 3 months.
        After 3 months of data we saw all data on one node. Now i understand as the token range is the timestamp range which is from 1970 to 2270 so of course our 3 month period fell on one node (with a 3 node cluster even 100 years would fall on one node).

        To properly manage this cf we need to either continuously move nodes around, a cumbersome operation, or change the key so it's prefixed with timestamp % 3months. This would allow 3 months of data to cycle over the whole cluster and wrap around again. Obviously we're leaning towards the latter solution as it simplifies operations. But it does require this patch.

        (When CFIF supports IndexClause everything changes, we change our cluster to RandomPartitioner, use secondary indexes, and never look back...)

        Show
        michaelsembwever mck added a comment - - edited Indeed. I could be using this asap. The use case is... We're using a ByteOrderedPartition because we run incremental hadoop jobs over one of our column families where "events" initially come in. This cf has RF=1 and time-based UUID keys that are manipulated so that their byte ordering are time ordered. (the timestamp put up front). Each column has ttl of 3 months. After 3 months of data we saw all data on one node. Now i understand as the token range is the timestamp range which is from 1970 to 2270 so of course our 3 month period fell on one node (with a 3 node cluster even 100 years would fall on one node). To properly manage this cf we need to either continuously move nodes around, a cumbersome operation, or change the key so it's prefixed with timestamp % 3months . This would allow 3 months of data to cycle over the whole cluster and wrap around again. Obviously we're leaning towards the latter solution as it simplifies operations. But it does require this patch. (When CFIF supports IndexClause everything changes, we change our cluster to RandomPartitioner, use secondary indexes, and never look back...)
        Hide
        jbellis Jonathan Ellis added a comment -

        Sounds reasonable. Patch looks okay to me. I'll commit it after you've tested it.

        Show
        jbellis Jonathan Ellis added a comment - Sounds reasonable. Patch looks okay to me. I'll commit it after you've tested it.
        Hide
        michaelsembwever mck added a comment -

        This is tested in production now.

        Show
        michaelsembwever mck added a comment - This is tested in production now.
        Hide
        jbellis Jonathan Ellis added a comment -

        Thanks, committed.

        Show
        jbellis Jonathan Ellis added a comment - Thanks, committed.
        Hide
        hudson Hudson added a comment -

        Integrated in Cassandra-0.8 #356 (See https://builds.apache.org/job/Cassandra-0.8/356/)
        allow wrapping ranges in Hadoop queries
        patch by Mck SembWever; reviewed by jbellis for CASSANDRA-3137

        jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178551
        Files :

        • /cassandra/branches/cassandra-0.8/CHANGES.txt
        • /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java
        Show
        hudson Hudson added a comment - Integrated in Cassandra-0.8 #356 (See https://builds.apache.org/job/Cassandra-0.8/356/ ) allow wrapping ranges in Hadoop queries patch by Mck SembWever; reviewed by jbellis for CASSANDRA-3137 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178551 Files : /cassandra/branches/cassandra-0.8/CHANGES.txt /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyInputFormat.java

          People

          • Assignee:
            michaelsembwever mck
            Reporter:
            michaelsembwever mck
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development