Cassandra
  1. Cassandra
  2. CASSANDRA-4989

Expose new SliceQueryFilter features through Thrift interface

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Fix Version/s: None
    • Component/s: API
    • Labels:
      None

      Description

      SliceQueryFilter has some very useful new features like ability to specify a composite column prefix to group by and specify a limit of groups to return.

      This is very useful if for example I have a wide row with columns prefixed by timestamp and I want to retrieve the latest columns, but I don't know the column names. Say I have a row
      row -> (t1, c1), (t1, c2)... (t1, cn) ... (t0,c1) ... etc

      Query slice range (t1,) group by prefix (1) limit (1)

      As a more general question, is the Thrift interface going to be kept up-to-date with the feature changes or will it be left behind (a mistake IMO) ?

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        361d 9h 33m 1 Jonathan Ellis 20/Nov/13 00:04
        Jonathan Ellis made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Won't Fix [ 2 ]
        Hide
        Jonathan Ellis added a comment -

        Not against this feature on its merits but I don't know anyone who is prioritizing the Thrift API these days. Feel free to reopen if you have a patch.

        Show
        Jonathan Ellis added a comment - Not against this feature on its merits but I don't know anyone who is prioritizing the Thrift API these days. Feel free to reopen if you have a patch.
        Hide
        Cristian added a comment -

        Yes, Sylvain is correct. This is essentially an optimization to avoid "iterating" through the columns and just get the latest group that has a common prefix. I noticed this can be done with the new SliceQueryFilter so it would be useful if it can be exposed.

        If I'm allowed to go off on a tangent here (I know, not the best place) having more pluggable behaviour would be an interesting direction to take with Cassandra. Same way it's possible to have custom column comparators, maybe we could have pluggable row level indexes, pluggable queries to use them, pluggable notification systems, etc. I know this has been discussed before, just wanted to add my vote here.

        Thanks

        Show
        Cristian added a comment - Yes, Sylvain is correct. This is essentially an optimization to avoid "iterating" through the columns and just get the latest group that has a common prefix. I noticed this can be done with the new SliceQueryFilter so it would be useful if it can be exposed. If I'm allowed to go off on a tangent here (I know, not the best place) having more pluggable behaviour would be an interesting direction to take with Cassandra. Same way it's possible to have custom column comparators, maybe we could have pluggable row level indexes, pluggable queries to use them, pluggable notification systems, etc. I know this has been discussed before, just wanted to add my vote here. Thanks
        Hide
        Sylvain Lebresne added a comment -

        If I understand what Christian is saying, he has composite columns whose first component is a timestamp and he wants the 'most recent group of columns whose first component timestamp is before time T'.

        Show
        Sylvain Lebresne added a comment - If I understand what Christian is saying, he has composite columns whose first component is a timestamp and he wants the 'most recent group of columns whose first component timestamp is before time T'.
        Hide
        Jonathan Ellis added a comment -

        Isn't 'most recent columns before time T' just a reverse-slice operation?

        Show
        Jonathan Ellis added a comment - Isn't 'most recent columns before time T' just a reverse-slice operation?
        Gavin made changes -
        Workflow patch-available, re-open possible [ 12753537 ] reopen-resolved, no closed status, patch-avail, testing [ 12758793 ]
        Gavin made changes -
        Field Original Value New Value
        Workflow no-reopen-closed, patch-avail [ 12735548 ] patch-available, re-open possible [ 12753537 ]
        Hide
        Cristian added a comment -

        Sorry if I haven't been more clear. What I'd like is to do that query efficiently when I don't know t1 precisely, I
        just want to get the latest columns before a time T

        That can be done currently with Thrift but will return all columns with time t < T, while this way I can efficiently
        get just the latest

        Note that "as of" type queries are very common in financial type applications for example, so it's worth considering this
        use case.

        I'm not sure about the handling of deleted keys but maybe we can find a way to generalize and expose this ? I would have asked for a feature like this anyway, it just so happens that looking at the code I see this has been done to support CQL limits

        Since I have an object serialization client API on top of Thrift, CQL is not much use to me...

        Show
        Cristian added a comment - Sorry if I haven't been more clear. What I'd like is to do that query efficiently when I don't know t1 precisely, I just want to get the latest columns before a time T That can be done currently with Thrift but will return all columns with time t < T, while this way I can efficiently get just the latest Note that "as of" type queries are very common in financial type applications for example, so it's worth considering this use case. I'm not sure about the handling of deleted keys but maybe we can find a way to generalize and expose this ? I would have asked for a feature like this anyway, it just so happens that looking at the code I see this has been done to support CQL limits Since I have an object serialization client API on top of Thrift, CQL is not much use to me...
        Hide
        Sylvain Lebresne added a comment -

        I'm not sure I understand your example. The way I understand your Query slice range (t1,) group by prefix (1) limit (1) you are literally asking for every columns staring with t1. You can absolutely do that with the thrift API currently (using CompositeType has a end-of-component feature).

        Besides, the group by prefix is a CQL3 implementation detail and does more than just grouping by prefix when counting (it also skip deleted records in some conditions that only make sense for CQL3 for instance). The exact semantic of that counting may change depending on the needs of CQL3 and is not something that has ever been meant to be exposed directly.

        Show
        Sylvain Lebresne added a comment - I'm not sure I understand your example. The way I understand your Query slice range (t1,) group by prefix (1) limit (1) you are literally asking for every columns staring with t1. You can absolutely do that with the thrift API currently (using CompositeType has a end-of-component feature). Besides, the group by prefix is a CQL3 implementation detail and does more than just grouping by prefix when counting (it also skip deleted records in some conditions that only make sense for CQL3 for instance). The exact semantic of that counting may change depending on the needs of CQL3 and is not something that has ever been meant to be exposed directly.
        Cristian created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Cristian
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development