Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Fix Version/s: None
    • Component/s: Tools
    • Labels:

      Description

      Doing

      SELECT count(*) from <cf>;
      

      will max out at 10,000 because that is the default limit for cql queries.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        133d 21h 47m 1 Sam Tunnicliffe 18/May/12 17:26
        Patch Available Patch Available Resolved Resolved
        65d 22h 26m 1 Jonathan Ellis 23/Jul/12 15:53
        Gavin made changes -
        Workflow patch-available, re-open possible [ 12748753 ] reopen-resolved, no closed status, patch-avail, testing [ 12753876 ]
        Gavin made changes -
        Workflow no-reopen-closed, patch-avail [ 12648094 ] patch-available, re-open possible [ 12748753 ]
        Jonathan Ellis made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Assignee Sam Tunnicliffe [ beobal ]
        Fix Version/s 1.1.3 [ 12321881 ]
        Resolution Duplicate [ 3 ]
        Hide
        Jonathan Ellis added a comment -

        closing in favor of the more general CASSANDRA-4415

        Show
        Jonathan Ellis added a comment - closing in favor of the more general CASSANDRA-4415
        Sylvain Lebresne made changes -
        Fix Version/s 1.1.3 [ 12321881 ]
        Fix Version/s 1.1.2 [ 12321445 ]
        Hide
        Sam Tunnicliffe added a comment - - edited

        I didn't think the overriding the key bounds would cause a problem, as queries of the form
        SELECT COUNT FROM <cf> WHERE key='foo' and SELECT COUNT FROM <cf> WHERE key in ('foo', 'bar')
        aren't treated as key range queries and so go down the branch that uses getSlice(). However, I'd overlooked the token() function
        (and, I imagine, behaviour when using OrderPreservingPartitioner) where a key range can be specified.

        Am I right that the issue with not paging within internal rows is that fetching/materialising PAGE_COUNT_SIZE wide rows could still present a problem
        by blowing up memory usage? If so, I could rework to add paging within the rows (and fix the token()/range queries), or given your comment re: CASSANDRA-2478, do you think it'd be better to just abandon this patch?

        Show
        Sam Tunnicliffe added a comment - - edited I didn't think the overriding the key bounds would cause a problem, as queries of the form SELECT COUNT FROM <cf> WHERE key='foo' and SELECT COUNT FROM <cf> WHERE key in ('foo', 'bar') aren't treated as key range queries and so go down the branch that uses getSlice(). However, I'd overlooked the token() function (and, I imagine, behaviour when using OrderPreservingPartitioner) where a key range can be specified. Am I right that the issue with not paging within internal rows is that fetching/materialising PAGE_COUNT_SIZE wide rows could still present a problem by blowing up memory usage? If so, I could rework to add paging within the rows (and fix the token()/range queries), or given your comment re: CASSANDRA-2478 , do you think it'd be better to just abandon this patch?
        Hide
        Sylvain Lebresne added a comment -

        The patch overrides the row key bounds, so for instance won't work correctly for SELECT count FROM <cf> WHERE key='foo'. It also don't take a potentially provided user LIMIT into account (I reckon it's dumb to give a LIMIT when counting rows but we still need to support it correctly). Also, it only page internal rows, but not within internal rows (and also, makeSlicePredicate creates a predicate with a count of -1, so that won't work with a getRangeSlice call that count keys instead of columns).

        I think that unfortunately, to correctly page count(), you need to be able to correctly page CQL3 queries in general, and that is a little bit more work (since we basically need to handle both query by slices, query by names and range queries underneath). I've (just) started work on handling such paging for CASSANDRA-2478, and I really think there is no point in doing something specific for count.

        Getting back to the description of the issue, I note that imo we should stop using a default limit of 10,000 for CQL queries as this is totally random.

        Show
        Sylvain Lebresne added a comment - The patch overrides the row key bounds, so for instance won't work correctly for SELECT count FROM <cf> WHERE key='foo' . It also don't take a potentially provided user LIMIT into account (I reckon it's dumb to give a LIMIT when counting rows but we still need to support it correctly). Also, it only page internal rows, but not within internal rows (and also, makeSlicePredicate creates a predicate with a count of -1, so that won't work with a getRangeSlice call that count keys instead of columns). I think that unfortunately, to correctly page count(), you need to be able to correctly page CQL3 queries in general, and that is a little bit more work (since we basically need to handle both query by slices, query by names and range queries underneath). I've (just) started work on handling such paging for CASSANDRA-2478 , and I really think there is no point in doing something specific for count. Getting back to the description of the issue, I note that imo we should stop using a default limit of 10,000 for CQL queries as this is totally random.
        Jonathan Ellis made changes -
        Labels lhf cql3 lhf
        Reviewer slebresne
        Sam Tunnicliffe made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Assignee Sam Tunnicliffe [ beobal ]
        Hide
        Sam Tunnicliffe added a comment -

        patch adds paging to count queries in cql3 SelectStatement

        Show
        Sam Tunnicliffe added a comment - patch adds paging to count queries in cql3 SelectStatement
        Sam Tunnicliffe made changes -
        Jonathan Ellis made changes -
        Fix Version/s 1.1.2 [ 12321445 ]
        Fix Version/s 1.1.1 [ 12319857 ]
        Jonathan Ellis made changes -
        Labels lhf
        Hide
        nish gowda added a comment -

        Request to fix this at the earliest.

        Thanks!

        Show
        nish gowda added a comment - Request to fix this at the earliest. Thanks!
        Jonathan Ellis made changes -
        Parent CASSANDRA-3761 [ 12539160 ]
        Issue Type Bug [ 1 ] Sub-task [ 7 ]
        Sylvain Lebresne made changes -
        Fix Version/s 1.1.1 [ 12319857 ]
        Fix Version/s 1.1 [ 12317615 ]
        Hide
        Jonathan Ellis added a comment -

        You could take a similar approach that CASSANDRA-2894 does.

        Show
        Jonathan Ellis added a comment - You could take a similar approach that CASSANDRA-2894 does.
        Hide
        nish gowda added a comment -

        Could this be fixed?

        Show
        nish gowda added a comment - Could this be fixed?
        Nick Bailey made changes -
        Field Original Value New Value
        Description Doing 'SELECT count(*) from <cf>;' will max out at 10,000 because that is the default limit for cql queries. Doing

        {noformat}
        SELECT count(*) from <cf>;
        {noformat}

        will max out at 10,000 because that is the default limit for cql queries.
        Component/s Tools [ 12312979 ]
        Nick Bailey created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Nick Bailey
            Reviewer:
            Sylvain Lebresne
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development