Cassandra
  1. Cassandra
  2. CASSANDRA-6220

Unable to select multiple entries using In clause on clustering part of compound key

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Duplicate
    • Fix Version/s: None
    • Component/s: Core
    • Labels:
      None

      Description

      I have the following table:

      CREATE TABLE rating (
      id bigint,
      mid int,
      hid int,
      r double,
      PRIMARY KEY ((id, mid), hid));

      And I get really really strange result sets on the following queries:

      cqlsh:bm> SELECT hid, r FROM rating WHERE id = 755349113 and mid = 201310 and hid = 201329320;

      hid | r
      ----------+-------
      201329320 | 45.476

      (1 rows)

      cqlsh:bm> SELECT hid, r FROM rating WHERE id = 755349113 and mid = 201310 and hid = 201329220;

      hid | r
      ----------+------
      201329220 | 53.62

      (1 rows)

      cqlsh:bm> SELECT hid, r FROM rating WHERE id = 755349113 and mid = 201310 and hid in (201329320, 201329220);

      hid | r
      ----------+-------
      201329320 | 45.476

      (1 rows) <-- WRONG - should be two records

      As you can see although both records exist I'm not able the fetch all of them using in clause. By now I have to cycle my requests which are about 30 and I find it highly inefficient given that I query physically the same row.
      More of that - it doesn't happen all the time! For different id values sometimes I get the correct dataset.

      Ideally I'd like the following select to work:
      SELECT hid, r FROM rating WHERE id = 755349113 and mid in ? and hid in ?;
      Which doesn't work either.

      1. inserts.zip
        2.52 MB
        Ashot Golovenko

        Issue Links

          Activity

          Hide
          Sylvain Lebresne added a comment -

          As for CASSANDRA-6137, pretty sure this has been solved in CASSANDRA-6327.

          Show
          Sylvain Lebresne added a comment - As for CASSANDRA-6137 , pretty sure this has been solved in CASSANDRA-6327 .
          Hide
          Constance Eustace added a comment - - edited

          I was able to reproduce the original way of reproduction (drop schema, create schema, INSERT / UPDATE with no nodetool compact in there). Post-repair of the corruption seemed to require nodetool compact, invalidatekeycache, and/or possibly flush.

          Now that I've repaired. I'm going to run a 3.5 million insert + simulataneous update run to see if the nodetool compact repair makes the data more durable, as has been seen today before.

          Show
          Constance Eustace added a comment - - edited I was able to reproduce the original way of reproduction (drop schema, create schema, INSERT / UPDATE with no nodetool compact in there). Post-repair of the corruption seemed to require nodetool compact, invalidatekeycache, and/or possibly flush. Now that I've repaired. I'm going to run a 3.5 million insert + simulataneous update run to see if the nodetool compact repair makes the data more durable, as has been seen today before.
          Hide
          Constance Eustace added a comment - - edited

          If I do this sequence:

          DROP SCHEMA
          CREATE SCHEMA
          CREATE INITIAL DATA (i.e. no updates to existing data)
          NODETOOL COMPACT <-- magic sauce
          MASSIVE INSERT + SIMULTANEOUS UPDATES to INITIAL DATA

          does not reproduce. The nodetool compact after the schema creation seems to reset/stabilize the database. I used to replicate very reliably after about 300,000 inserts / 2000 updates. Now I do 1.75million inserts with 20,000 updates and no reproduction.

          Obviously you could probably run the nodetool compact after the SCHEMA creation, and then do initial data creation/update+insert run

          Show
          Constance Eustace added a comment - - edited If I do this sequence: DROP SCHEMA CREATE SCHEMA CREATE INITIAL DATA (i.e. no updates to existing data) NODETOOL COMPACT <-- magic sauce MASSIVE INSERT + SIMULTANEOUS UPDATES to INITIAL DATA does not reproduce. The nodetool compact after the schema creation seems to reset/stabilize the database. I used to replicate very reliably after about 300,000 inserts / 2000 updates. Now I do 1.75million inserts with 20,000 updates and no reproduction. Obviously you could probably run the nodetool compact after the SCHEMA creation, and then do initial data creation/update+insert run
          Hide
          Constance Eustace added a comment -

          My current thinking is that truncation / schema recreation disrupts synchronization of compaction with the searching datastructure/sstables, and until a full compaction completes, updates after truncation/schema recreation are iffy...

          Show
          Constance Eustace added a comment - My current thinking is that truncation / schema recreation disrupts synchronization of compaction with the searching datastructure/sstables, and until a full compaction completes, updates after truncation/schema recreation are iffy...
          Hide
          Constance Eustace added a comment - - edited

          Does nodetool compact <keyspace> <tablename> fix the corruption? It did for me, but I don't think it stops the ongoing corruption...

          EDIT: my reproduction seems to indicate "nodetool compact" MAY fix ongoing updates after the "nodetool compact" was executed... I was unable to generate bad queries after another 1.5 million row inserts and 30,000 updates to existing data.

          Show
          Constance Eustace added a comment - - edited Does nodetool compact <keyspace> <tablename> fix the corruption? It did for me, but I don't think it stops the ongoing corruption... EDIT: my reproduction seems to indicate "nodetool compact" MAY fix ongoing updates after the "nodetool compact" was executed... I was unable to generate bad queries after another 1.5 million row inserts and 30,000 updates to existing data.
          Hide
          Constance Eustace added a comment -

          Thanks, was going to write a java driver reproduction in case the cass-jdbc was somehow creating the problem, but if you've reproduced that way I don't have to...

          Show
          Constance Eustace added a comment - Thanks, was going to write a java driver reproduction in case the cass-jdbc was somehow creating the problem, but if you've reproduced that way I don't have to...
          Hide
          Ashot Golovenko added a comment -

          For inserts I was using a datastax java driver 1.0.3 with cassandra 2.0.1, single node on MacOsX 10.8.5 with SSD.
          Wrong result sets can be seen through java driver and cqlsh as well.

          Show
          Ashot Golovenko added a comment - For inserts I was using a datastax java driver 1.0.3 with cassandra 2.0.1, single node on MacOsX 10.8.5 with SSD. Wrong result sets can be seen through java driver and cqlsh as well.
          Hide
          Constance Eustace added a comment -

          What do you use? Cass-jdbc, binary protocol, or is this simply cqlsh scripts?

          Show
          Constance Eustace added a comment - What do you use? Cass-jdbc, binary protocol, or is this simply cqlsh scripts?
          Hide
          Constance Eustace added a comment - - edited

          one of the CASS-6137 comments has a github with a reproduction script if you need to reliably reproduce. Takes about 400,000 inserts + 6000 updates for me, single node

          https://github.com/cowarlydragon/CASS-6137

          Show
          Constance Eustace added a comment - - edited one of the CASS-6137 comments has a github with a reproduction script if you need to reliably reproduce. Takes about 400,000 inserts + 6000 updates for me, single node https://github.com/cowarlydragon/CASS-6137
          Hide
          Ashot Golovenko added a comment -

          I've generated some insert scripts... Well the bug disappeared meanwhile but It'll be back=)

          Show
          Ashot Golovenko added a comment - I've generated some insert scripts... Well the bug disappeared meanwhile but It'll be back=)
          Hide
          Jonathan Ellis added a comment -

          Can you give an example of what we need to INSERT to reproduce?

          Show
          Jonathan Ellis added a comment - Can you give an example of what we need to INSERT to reproduce?
          Hide
          Ashot Golovenko added a comment -

          looks like the same problem

          Show
          Ashot Golovenko added a comment - looks like the same problem

            People

            • Assignee:
              Unassigned
              Reporter:
              Ashot Golovenko
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development