Cassandra
  1. Cassandra
  2. CASSANDRA-4415

Add cursor API/auto paging to the native CQL protocol

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 2.0 beta 1
    • Component/s: None
    • Labels:

      Description

      The goal here would be to use a query paging mechanism to the CQL native protocol. Typically the client/server with that would look something like this:

      C sends query to S.
      S sends N first rows matching the query + flag saying the response is not complete
      C requests the next N rows
      S sends N next rows + flag saying whether there is more
      C requests the next N rows
      ...
      S sends last rows + flag saying there is no more result
      

      The clear goal is for user to not have to worry about limiting queries and doing manual paging.

        Issue Links

          Activity

          Hide
          Aleksey Yeschenko added a comment -

          +1

          nit: unused boolean doCopy in AQP.filterEmpty() - from an earlier version, I assume
          femtonit: QP.countPaged() arguments don't line up

          Show
          Aleksey Yeschenko added a comment - +1 nit: unused boolean doCopy in AQP.filterEmpty() - from an earlier version, I assume femtonit: QP.countPaged() arguments don't line up
          Hide
          Sylvain Lebresne added a comment -

          Thanks for the review.

          I've pushed a additional patch on the same (force-rebased) branch to fix those issues. I realized that the QueryPagers.pagedQuery() method wasn't really cutting it since it was "losing" the CF-level deletion information, so I've replaced it with a simpler countPaged() method. The code was also slightly under-counting the number of remaining results so that's fixed too.

          All tests seem to be working fine here now.

          Show
          Sylvain Lebresne added a comment - Thanks for the review. I've pushed a additional patch on the same (force-rebased) branch to fix those issues. I realized that the QueryPagers.pagedQuery() method wasn't really cutting it since it was "losing" the CF-level deletion information, so I've replaced it with a simpler countPaged() method. The code was also slightly under-counting the number of remaining results so that's fixed too. All tests seem to be working fine here now.
          Hide
          Aleksey Yeschenko added a comment -

          Mostly LGTM. So far a couple issues and some nits:

          • native proto spec: section 7 mentions BATCH supporting <result_page_size>
          • CassandraServer.get_count() calculates pageSize, but never actually passes it to QueryPagers.pageQuery() - always passes COUNT_PAGE_SIZE instead
          • QueryPagers.pageQuery() in computeNext() doesn't handle pager.fetchPage(pageSize) returning an empty collection and throws NPE in the next line

          I think some other iterator code has similar issues, and count is slightly broken (I had AQP.firstName() throw NPE, too)

          These are the two tests that are broken (test/system/test_thrift_server.py):
          1. TestMutations.test_count
          2. TestMutations.test_count_around_page_size

          And some nits:

          • native proto spec: 4.1.4 QUERY 'The body message must be:' summary-line does not include <result_page_size>
          • AbstractQueryPager.discardLast() creates an iterator that's never used (Iterator<Column> iter = cf.iterator()
          • there are redundantly specialized Collections.<Inferrable>emptyList() calls (in AbstractQueryPager.fetchPage(), others)
          Show
          Aleksey Yeschenko added a comment - Mostly LGTM. So far a couple issues and some nits: native proto spec: section 7 mentions BATCH supporting <result_page_size> CassandraServer.get_count() calculates pageSize, but never actually passes it to QueryPagers.pageQuery() - always passes COUNT_PAGE_SIZE instead QueryPagers.pageQuery() in computeNext() doesn't handle pager.fetchPage(pageSize) returning an empty collection and throws NPE in the next line I think some other iterator code has similar issues, and count is slightly broken (I had AQP.firstName() throw NPE, too) These are the two tests that are broken (test/system/test_thrift_server.py): 1. TestMutations.test_count 2. TestMutations.test_count_around_page_size And some nits: native proto spec: 4.1.4 QUERY 'The body message must be:' summary-line does not include <result_page_size> AbstractQueryPager.discardLast() creates an iterator that's never used (Iterator<Column> iter = cf.iterator() there are redundantly specialized Collections.<Inferrable>emptyList() calls (in AbstractQueryPager.fetchPage(), others)
          Hide
          Sylvain Lebresne added a comment -

          I've pushed a rebase of this against current trunk at https://github.com/pcmanus/cassandra/commits/4415-2.

          Show
          Sylvain Lebresne added a comment - I've pushed a rebase of this against current trunk at https://github.com/pcmanus/cassandra/commits/4415-2 .
          Hide
          Sylvain Lebresne added a comment -

          Pushed patches for this to https://github.com/pcmanus/cassandra/compare/4415.

          There is 5 commits:

          1. the first one adds a new (more general) internal version of get_paged_slice.
            The problem with get_paged_slice is that it's only made to page range of full rows: it always restart at the beginning of the row when it start a new row. So what we need is to be able to page from some (key, column) start pair (to continue paging where we stopped it), but still only ever return columns that match our column filter. That's what the new PagedRangeCommand does.
            I will note that the old get_paged_slice is made a special case of the new command, and strictly speaking this change it's behavior. But I'm pretty sure this is fixing a bug more than anything else. More precisely, if you do:
            get_paged_slice("cf", KeyRange("a", "", 1000), "c4", CL.ONE)
            

            and it happens that row "a" doesn't exist, then the command will still start from "c4" for whatever the first row returned is. That feels broken to me, so after this patch it only start from "c4" for row "a". If "a" doesn't exist, it starts from the beginning of whatever is the first row following "a".

          2. the 2nd commit adds query pagers to page any type of internal query.
          3. the 3rd commit modify the binary protocol to add the paging support. Basically, it adds a pageSize to query messages that defines how big the next page should be. And then a NEXT message allow to get the following pages one by one. And result sets have a flag that say "there is more page to get".
          4. the 4th commit replace 2 existing use of paging by the new pagers:
            • in CassandraServer.get_count(). I'll note that this does re-introduce CASSANDRA-5099 until CASSANDRA-5149. Bus as said on the latter ticket, I just think we should fix CASSANDRA-5149 once and for all since all the paging done by those patches is potentially buggy otherwise.
            • and we had an existing SliceQueryPager in org.apache.cassandra.db that was use to index incrementally a wide row. It's replaced by the new (equivalent) one.
              It also probably wouln't be too hard to add paging to multi_get_count() but there's a slight refactoring to do in CassandraServer and I got lazy. Not sure anyone use that method anyway since no-one complained about the lack of paging.
          5. the 5th and last commit use the new pagers to page internally 'SELECT count(1)' queries (so we stop OOMing on those).

          The patches add a few unit tests for the pagers (arguably not a lot) and I've tested the protocol bits manually (using the debug-cql toy client).

          Show
          Sylvain Lebresne added a comment - Pushed patches for this to https://github.com/pcmanus/cassandra/compare/4415 . There is 5 commits: the first one adds a new (more general) internal version of get_paged_slice. The problem with get_paged_slice is that it's only made to page range of full rows: it always restart at the beginning of the row when it start a new row. So what we need is to be able to page from some (key, column) start pair (to continue paging where we stopped it), but still only ever return columns that match our column filter. That's what the new PagedRangeCommand does. I will note that the old get_paged_slice is made a special case of the new command, and strictly speaking this change it's behavior. But I'm pretty sure this is fixing a bug more than anything else. More precisely, if you do: get_paged_slice("cf", KeyRange("a", "", 1000), "c4", CL.ONE) and it happens that row "a" doesn't exist, then the command will still start from "c4" for whatever the first row returned is. That feels broken to me, so after this patch it only start from "c4" for row "a". If "a" doesn't exist, it starts from the beginning of whatever is the first row following "a". the 2nd commit adds query pagers to page any type of internal query. the 3rd commit modify the binary protocol to add the paging support. Basically, it adds a pageSize to query messages that defines how big the next page should be. And then a NEXT message allow to get the following pages one by one. And result sets have a flag that say "there is more page to get". the 4th commit replace 2 existing use of paging by the new pagers: in CassandraServer.get_count(). I'll note that this does re-introduce CASSANDRA-5099 until CASSANDRA-5149 . Bus as said on the latter ticket, I just think we should fix CASSANDRA-5149 once and for all since all the paging done by those patches is potentially buggy otherwise. and we had an existing SliceQueryPager in org.apache.cassandra.db that was use to index incrementally a wide row. It's replaced by the new (equivalent) one. It also probably wouln't be too hard to add paging to multi_get_count() but there's a slight refactoring to do in CassandraServer and I got lazy. Not sure anyone use that method anyway since no-one complained about the lack of paging. the 5th and last commit use the new pagers to page internally 'SELECT count(1)' queries (so we stop OOMing on those). The patches add a few unit tests for the pagers (arguably not a lot) and I've tested the protocol bits manually (using the debug-cql toy client).
          Hide
          Aleksey Yeschenko added a comment -

          Sorry. Misclicked (:

          Show
          Aleksey Yeschenko added a comment - Sorry. Misclicked (:
          Hide
          Sylvain Lebresne added a comment -

          Yeah, I guess we haven't much choice, though I'll note that the burden of saying when a query might lack isolation will be on the client driver documentation. That is, we can explain that drivers will likely use query paging for large queries which will defeats isolation, but then we'll have to refer to said driver documentation to know exactly when paging is used and how to control it. Unless we fix the page size to a constant, but that's a bit lame imho.

          Show
          Sylvain Lebresne added a comment - Yeah, I guess we haven't much choice, though I'll note that the burden of saying when a query might lack isolation will be on the client driver documentation. That is, we can explain that drivers will likely use query paging for large queries which will defeats isolation, but then we'll have to refer to said driver documentation to know exactly when paging is used and how to control it. Unless we fix the page size to a constant, but that's a bit lame imho.
          Hide
          Tupshin Harper added a comment - - edited

          IMO, document thoroughly the lack of isolation when paging and punt. It's the best we can do, and is generally sufficient.

          Show
          Tupshin Harper added a comment - - edited IMO, document thoroughly the lack of isolation when paging and punt. It's the best we can do, and is generally sufficient.
          Hide
          Sylvain Lebresne added a comment -

          I've written a good chunk of code for this but there is one big issue with this I'm not sure how to solve best: this will "break" isolation from a user perspective.

          More precisely, the goal is that from the user point of view all of this is transparent: it does a query and will then fetch query one by one and the driver will use that feature underneath to having getting into OOM situation and to avoid timeout for query yielding lots of result. However, since underneath we'll do multiple query, it might look as if there was not isolation if your unlucky with the timing of the requests.

          So I don't know what to do about that. I unfortunately don't see a solution to fix it per-se, so the question is how to make that acceptable.

          Show
          Sylvain Lebresne added a comment - I've written a good chunk of code for this but there is one big issue with this I'm not sure how to solve best: this will "break" isolation from a user perspective. More precisely, the goal is that from the user point of view all of this is transparent: it does a query and will then fetch query one by one and the driver will use that feature underneath to having getting into OOM situation and to avoid timeout for query yielding lots of result. However, since underneath we'll do multiple query, it might look as if there was not isolation if your unlucky with the timing of the requests. So I don't know what to do about that. I unfortunately don't see a solution to fix it per-se, so the question is how to make that acceptable.

            People

            • Assignee:
              Sylvain Lebresne
              Reporter:
              Sylvain Lebresne
              Reviewer:
              Aleksey Yeschenko
            • Votes:
              11 Vote for this issue
              Watchers:
              31 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development