Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11314

Inconsistent select count(*)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Not A Problem
    • None
    • None
    • Ununtu 14.04 LTS

    • Normal
    • 3.3

    Description

      Hello,

      I currently have this setup:

      Cassandra 3.3 (Community edition downloaded from Datastax) installed on 3 nodes and I have created this table:

      CREATE TABLE billing.collected_data_day (
      collection_day int,
      timestamp timestamp,
      record_id uuid,
      dimensions map<text, text>,
      entity_id text,
      measurements map<text, text>,
      source_id text,
      PRIMARY KEY (collection_day, timestamp, record_id)
      ) WITH CLUSTERING ORDER BY (timestamp ASC, record_id ASC)
      AND bloom_filter_fp_chance = 0.01
      AND caching =

      {'keys': 'ALL', 'rows_per_partition': 'NONE'}

      AND comment = ''
      AND compaction =

      {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

      AND compression =

      {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

      AND crc_check_chance = 1.0
      AND dclocal_read_repair_chance = 0.1
      AND default_time_to_live = 0
      AND gc_grace_seconds = 864000
      AND max_index_interval = 2048
      AND memtable_flush_period_in_ms = 0
      AND min_index_interval = 128
      AND read_repair_chance = 0.0
      AND speculative_retry = '99PERCENTILE';

      This table as you notice is partitioned by collection_day. This is because at the end of the day we need to have fast access to all the data generated in a day. collection day will be the x day from 1970

      In this table we have inserted roughly 12milion rows for testing purposes and we did a simple count. As you can see the results vary ...

      cqlsh:billing> select count from collected_data_day where collection_day=16462;

      count
      -------
      55341

      (1 rows)
      cqlsh:billing> select count from collected_data_day where collection_day=16462;

      count
      -------
      55372

      (1 rows)
      cqlsh:billing> select count from collected_data_day where collection_day=16462;

      count
      -------
      55300

      (1 rows)
      cqlsh:billing> select count from collected_data_day where collection_day=16462;

      count
      -------
      55300

      (1 rows)
      cqlsh:billing> select count from collected_data_day where collection_day=16462;

      count
      -------
      55300

      (1 rows)
      cqlsh:billing> select count from collected_data_day where collection_day=16462;

      count
      -------
      55303

      (1 rows)
      cqlsh:billing> select count from collected_data_day where collection_day=16462;

      count
      -------
      55374

      (1 rows)

      I am running the query from the seed node of the cassandra cluster. As you can see most of the results are varying and I don't know the reason for this. We are not writing anything into the cluster at this time , we are only querying the cluster and only using this CQLSH.

      This is very similar to CASSANDRA-8940 but that is targeted for 2.1x

      Could it be that we are having the same issue in 3.3 ?

      Please let me know what extra info I can provide

      Attachments

        1. testrun.log
          26 kB
          Mircea Lemnaru
        2. vnodes_and_hosts
          72 kB
          Mircea Lemnaru

        Activity

          People

            blerer Benjamin Lerer
            mlemnaru Mircea Lemnaru
            Benjamin Lerer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: