Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Fix Version/s: None
    • Component/s: Tools
    • Labels:

      Description

      Doing

      SELECT count(*) from <cf>;
      

      will max out at 10,000 because that is the default limit for cql queries.

        Activity

        Hide
        nish gowda added a comment -

        Could this be fixed?

        Show
        nish gowda added a comment - Could this be fixed?
        Hide
        Jonathan Ellis added a comment -

        You could take a similar approach that CASSANDRA-2894 does.

        Show
        Jonathan Ellis added a comment - You could take a similar approach that CASSANDRA-2894 does.
        Hide
        nish gowda added a comment -

        Request to fix this at the earliest.

        Thanks!

        Show
        nish gowda added a comment - Request to fix this at the earliest. Thanks!
        Hide
        Sam Tunnicliffe added a comment -

        patch adds paging to count queries in cql3 SelectStatement

        Show
        Sam Tunnicliffe added a comment - patch adds paging to count queries in cql3 SelectStatement
        Hide
        Sylvain Lebresne added a comment -

        The patch overrides the row key bounds, so for instance won't work correctly for SELECT count FROM <cf> WHERE key='foo'. It also don't take a potentially provided user LIMIT into account (I reckon it's dumb to give a LIMIT when counting rows but we still need to support it correctly). Also, it only page internal rows, but not within internal rows (and also, makeSlicePredicate creates a predicate with a count of -1, so that won't work with a getRangeSlice call that count keys instead of columns).

        I think that unfortunately, to correctly page count(), you need to be able to correctly page CQL3 queries in general, and that is a little bit more work (since we basically need to handle both query by slices, query by names and range queries underneath). I've (just) started work on handling such paging for CASSANDRA-2478, and I really think there is no point in doing something specific for count.

        Getting back to the description of the issue, I note that imo we should stop using a default limit of 10,000 for CQL queries as this is totally random.

        Show
        Sylvain Lebresne added a comment - The patch overrides the row key bounds, so for instance won't work correctly for SELECT count FROM <cf> WHERE key='foo' . It also don't take a potentially provided user LIMIT into account (I reckon it's dumb to give a LIMIT when counting rows but we still need to support it correctly). Also, it only page internal rows, but not within internal rows (and also, makeSlicePredicate creates a predicate with a count of -1, so that won't work with a getRangeSlice call that count keys instead of columns). I think that unfortunately, to correctly page count(), you need to be able to correctly page CQL3 queries in general, and that is a little bit more work (since we basically need to handle both query by slices, query by names and range queries underneath). I've (just) started work on handling such paging for CASSANDRA-2478 , and I really think there is no point in doing something specific for count. Getting back to the description of the issue, I note that imo we should stop using a default limit of 10,000 for CQL queries as this is totally random.
        Hide
        Sam Tunnicliffe added a comment - - edited

        I didn't think the overriding the key bounds would cause a problem, as queries of the form
        SELECT COUNT FROM <cf> WHERE key='foo' and SELECT COUNT FROM <cf> WHERE key in ('foo', 'bar')
        aren't treated as key range queries and so go down the branch that uses getSlice(). However, I'd overlooked the token() function
        (and, I imagine, behaviour when using OrderPreservingPartitioner) where a key range can be specified.

        Am I right that the issue with not paging within internal rows is that fetching/materialising PAGE_COUNT_SIZE wide rows could still present a problem
        by blowing up memory usage? If so, I could rework to add paging within the rows (and fix the token()/range queries), or given your comment re: CASSANDRA-2478, do you think it'd be better to just abandon this patch?

        Show
        Sam Tunnicliffe added a comment - - edited I didn't think the overriding the key bounds would cause a problem, as queries of the form SELECT COUNT FROM <cf> WHERE key='foo' and SELECT COUNT FROM <cf> WHERE key in ('foo', 'bar') aren't treated as key range queries and so go down the branch that uses getSlice(). However, I'd overlooked the token() function (and, I imagine, behaviour when using OrderPreservingPartitioner) where a key range can be specified. Am I right that the issue with not paging within internal rows is that fetching/materialising PAGE_COUNT_SIZE wide rows could still present a problem by blowing up memory usage? If so, I could rework to add paging within the rows (and fix the token()/range queries), or given your comment re: CASSANDRA-2478 , do you think it'd be better to just abandon this patch?
        Hide
        Jonathan Ellis added a comment -

        closing in favor of the more general CASSANDRA-4415

        Show
        Jonathan Ellis added a comment - closing in favor of the more general CASSANDRA-4415

          People

          • Assignee:
            Unassigned
            Reporter:
            Nick Bailey
            Reviewer:
            Sylvain Lebresne
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development