Cassandra
  1. Cassandra
  2. CASSANDRA-4851

CQL3: improve support for paginating over composites

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 2.0.6
    • Component/s: API
    • Labels:
      None

      Description

      Consider the following table:

      CREATE TABLE test (
          k int,
          c1 int,
          c2 int,
          PRIMARY KEY (k, c1, c2)
      )
      

      with the following data:

      k | c1 | c2
      ------------
      0 | 0  | 0
      0 | 0  | 1
      0 | 1  | 0
      0 | 1  | 1
      

      Currently, CQL3 allows to slice over either c1 or c2:

      SELECT * FROM test WHERE k = 0 AND c1 > 0 AND c1 < 2
      SELECT * FROM test WHERE k = 0 AND c1 = 1 AND c2 > 0 AND c2 < 2
      

      but you cannot express a query that return the 3 last records. Indeed, for that you would need to do a query like say:

      SELECT * FROM test WHERE k = 0 AND ((c1 = 0 AND c2 > 0) OR c2 > 0)
      

      but we don't support that.

      This can make it hard to paginate over say all records for k = 0 (I'm saying "can" because if the value for c2 cannot be very large, an easy workaround could be to paginate by entire value of c1, which you can do).

      For the case where you only paginate to avoid OOMing on a query, CASSANDRA-4415 will that and is probably the best solution. However, there may be case where the pagination is say user (as in, the user of your application) triggered.

      I note that one solution would be to add the OR support at least in case like the one above. That's definitively doable but on the other side, we won't be able to support full-blown OR, so it may not be very natural that we support seemingly random combination of OR and not others.

      Another solution would be to allow the following syntax:

      SELECT * FROM test WHERE k = 0 AND (c1, c2) > (0, 0)
      

      which would literally mean that you want records where the values of c1 and c2 taken as a tuple is lexicographically greater than the tuple (0, 0). This is less SQL-like (though maybe some SQL store have that, it's a fairly thing to have imo?), but would be much simpler to implement and probably to use too.

      1. 4851.txt
        25 kB
        Sylvain Lebresne

        Issue Links

          Activity

          Sylvain Lebresne created issue -
          Gavin made changes -
          Field Original Value New Value
          Workflow no-reopen-closed, patch-avail [ 12731080 ] patch-available, re-open possible [ 12753448 ]
          Gavin made changes -
          Workflow patch-available, re-open possible [ 12753448 ] reopen-resolved, no closed status, patch-avail, testing [ 12758738 ]
          DOAN DuyHai made changes -
          Comment [ Big +10 for this feature

          Right now I am preparing some slides for a talk and tutorial on Cassandra to convince people switching from Thrift to CQL3. However I am facing serious issue because of the limitation of CQL3 not being able to allow inequality on more than 1 clustered component at a time.

           My example is quite trivial

          {code:sql}
           CREATE TABLE comment_index_by_rating
           (
              songId uuid,
              rating int, // rating is integer from 1 to 10
              date uuid, // date of the comment
              comment text, //comment message itself
              userLogin text, //login of the user who posts the comment
              PRIMARY KEY (songId,rating,date)
           )
          {code}


           I would like to paginate over comment so the first query would be

          {code:sql}
           SELECT * FROM comment_index_by_rating WHERE songId = .... ORDER BY rating DESC LIMIT 10; // fetch first 10 comments
          {code}

           The following queries would be:

          {code:sql}
           SELECT * FROM comment_index_by_rating WHERE songId = .... AND rating <= {rating_of_last_comment_of_previous_batch} AND date <= {date_of_last_comment_of_previous_batch}
          {code}

           Right now it is just IMPOSSIBLE to paginate like this, which is PITA.

           I know that there is already jira https://issues.apache.org/jira/browse/CASSANDRA-4415 which is a really good idea but the issue raised above is *beyond the scope of just paging data*.

           People are using more and more compound primary keys to model with Cassandra and they should be able to do slice queries with inequality from all compound components.

           There are lots of use cases where such usage is required

          For example indexing daily metrics
          {code:sql}
          CREATE TABLE daily_metrics
          (
            day int, // day in YYYYMMDD format
            hour int,
            minute int,
            second int,
            metrics blob,
            PRIMARY KEY (day, hour, minute, second)
          )
          {code}

           I should be able to grep all metrics from a range of date

            // select all metrics from 8:30am to 10am
           {code:sql}
           SELECT metrics FROM daily_metrics WHERE day = 20130828 AND hour >= 8 AND minute >= 30 and hour <= 10
           {code}

          ]
          DOAN DuyHai made changes -
          Comment [ Big +10 for this issue

          Right now I am preparing some slides for a talk and tutorial on Cassandra to convince people switching from Thrift to CQL3. However I am facing serious issue because of the limitation of CQL3 not being able to allow inequality on more than 1 clustered component at a time.

           My example is quite trivial. Let's consider a rating and comment wide row for songs

          {code:sql}
           CREATE TABLE comment_index_by_rating
           (
              songId uuid,
              rating int, // rating is integer from 1 to 10
              date uuid, // date of the comment
              comment text, //comment message itself
              userLogin text, //login of the user who posts the comment
              PRIMARY KEY (songId,rating,date)
           )
          {code}


           I would like to paginate over comment so the first query would be

          {code:sql}
           SELECT * FROM comment_index_by_rating WHERE songId = .... ORDER BY rating DESC LIMIT 10; // fetch first 10 comments
          {code}

           The following queries would be:

          {code:sql}
           SELECT * FROM comment_index_by_rating WHERE songId = .... AND rating <= {rating_of_last_comment_of_previous_batch} AND date <= {date_of_last_comment_of_previous_batch}
          {code}

           Right now it is just IMPOSSIBLE to paginate like this, which is PITA.

           I know that there is already jira [CASSANDRA-4415|https://issues.apache.org/jira/browse/CASSANDRA-4415] which is a really good idea but the issue raised above is *beyond the scope of just paging data*.

           People are using more and more compound primary keys to model with Cassandra and they should be able to do slice queries with inequality from all compound components.

           There are lots of use cases where such usage is required

          For example indexing daily metrics
          {code:sql}
          CREATE TABLE daily_metrics
          (
            day int, // day in YYYYMMDD format
            hour int,
            minute int,
            second int,
            metrics blob,
            PRIMARY KEY (day, hour, minute, second)
          )
          {code}

           I should be able to grep all metrics from a range of date

            // select all metrics from 8:30am to 10am
           {code:sql}
           SELECT metrics FROM daily_metrics WHERE day = 20130828 AND hour >= 8 AND minute >= 30 and hour <= 10
           {code}
          ]
          Sylvain Lebresne made changes -
          Link This issue depends upon CASSANDRA-5956 [ CASSANDRA-5956 ]
          Sylvain Lebresne made changes -
          Link This issue depends upon CASSANDRA-5956 [ CASSANDRA-5956 ]
          Sylvain Lebresne made changes -
          Link This issue is duplicated by CASSANDRA-5956 [ CASSANDRA-5956 ]
          Brandon Williams made changes -
          Fix Version/s 2.0.5 [ 12325761 ]
          Sylvain Lebresne made changes -
          Fix Version/s 2.0.6 [ 12326170 ]
          Fix Version/s 2.0.5 [ 12325761 ]
          Sylvain Lebresne made changes -
          Attachment 4851.txt [ 12626662 ]
          Sylvain Lebresne made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Assignee Sylvain Lebresne [ slebresne ]
          Jonathan Ellis made changes -
          Component/s API [ 12313742 ]
          Reviewer Aleksey Yeschenko [ iamaleksey ]
          Sylvain Lebresne made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]

            People

            • Assignee:
              Sylvain Lebresne
              Reporter:
              Sylvain Lebresne
              Reviewer:
              Aleksey Yeschenko
            • Votes:
              4 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development