Cassandra
  1. Cassandra
  2. CASSANDRA-4877

Range queries return fewer result after a lot of delete

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.0 beta 3
    • Component/s: None
    • Labels:
      None

      Description

      Hi, I'm testing on the trunk version
      I'm using : [cqlsh 2.3.0 | Cassandra 1.2.0-beta1-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.35.0]

      My use case is :
      I create a table
      CREATE TABLE premier (
      id int PRIMARY KEY,
      value int
      ) WITH
      comment='' AND
      caching='KEYS_ONLY' AND
      read_repair_chance=0.100000 AND
      dclocal_read_repair_chance=0.000000 AND
      gc_grace_seconds=864000 AND
      replicate_on_write='true' AND
      compression=

      {'sstable_compression': 'SnappyCompressor'}

      ;

      1) I insert 10 000 000 rows (they are like id = 1 and value =1)
      2) I delete 2 000 000 rows (i use random method to choose the key value)
      3) I do select * from premier ; and my result is 7944 instead of 10 000.
      4) if if do select * from premier limit 20000 ; my result is 15839 .

      So after a lot of delete, the range operator is not working.

      1. 0001-4877.patch
        24 kB
        Sylvain Lebresne
      2. 0002-Rename-maxIsColumns-to-countCQL3Rows.patch
        25 kB
        Sylvain Lebresne

        Activity

        Hide
        Sylvain Lebresne added a comment -

        Committed, thanks

        Show
        Sylvain Lebresne added a comment - Committed, thanks
        Hide
        Jonathan Ellis added a comment -

        +1

        Show
        Jonathan Ellis added a comment - +1
        Hide
        Sylvain Lebresne added a comment -

        Attaching patch to fix this. The problem is that our handling of LIMIT was still not correct, in particular when NamesQueryFilter where used, as delete rows were wrongfully counted.

        One problem with that patch is that we may still under-count in a mixed 1.1/1.2 cluster because 1.1 nodes won't know how to count correctly. That's sad, but at the same time changing this in 1.1 would be hard and dangerous and CQL3 is beta in 1.1 after all.

        Note that I'm attaching 2 patches. The first one is the bulk of the fix. The second one is mostly a renaming of the 'maxIsColumn' parameters that is used in a number of place to 'countCQL3Rows' because that describe more faithfully what this parameter actually do.

        Show
        Sylvain Lebresne added a comment - Attaching patch to fix this. The problem is that our handling of LIMIT was still not correct, in particular when NamesQueryFilter where used, as delete rows were wrongfully counted. One problem with that patch is that we may still under-count in a mixed 1.1/1.2 cluster because 1.1 nodes won't know how to count correctly. That's sad, but at the same time changing this in 1.1 would be hard and dangerous and CQL3 is beta in 1.1 after all. Note that I'm attaching 2 patches. The first one is the bulk of the fix. The second one is mostly a renaming of the 'maxIsColumn' parameters that is used in a number of place to 'countCQL3Rows' because that describe more faithfully what this parameter actually do.

          People

          • Assignee:
            Sylvain Lebresne
            Reporter:
            julien campan
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development