Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-8824

cassandra python driver return None when querying static column on partition bigger than 5000 entites

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Duplicate
    • None
    • None
    • None
    • Normal

    Description

      When we querying partition with static column that has more than 5000 entities some of them has unset static value, however when querying by cqlsh everything is fine.

      Here is example, expire is a static column, folder_id is primary key.

      cqlsh> select id, parent_id, expire, mtime from share.entity where folder_id='68f2af3a2d1e4f95a231d5cb47e57cf2' and mtime < '2015-02-01 06:21:25+0000';
      
       id                               | parent_id | expire                   | mtime
      ----------------------------------+-----------+--------------------------+--------------------------
       68f2af3a2d1e4f95a231d5cb47e57cf2 |      null | 2015-02-22 10:51:27+0000 | 2015-02-01 06:21:24+0000
      
      cqlsh> select count(*) from share.entity where folder_id='68f2af3a2d1e4f95a231d5cb47e57cf2';
       count
      -------
        5547
      
      In [1]: from django.db import connection
      
      In [2]: ses = connection.connection.session
      
      In [3]: from cassandra.query import SimpleStatement
      
      In [13]: query = "select * from share.entity where folder_id='68f2af3a2d1e4f95a231d5cb47e57cf2'";
      
      In [14]: st = SimpleStatement(query)
      
      In [15]: c, d = 0, 0
      
      In [16]: for e in ses.execute(st):
          if e['expire'] is None:
              c += 1
          else:
              d += 1
      
      In [17]: c
      Out[17]: 547
      
      In [18]: d
      Out[18]: 5000
      
      

      After further digging its turned out that this is a problem with fetch_size param and this can be easily reproduced:

      In [1]: from cassandra.query import SimpleStatement
      
      In [2]: from django.db import connection
      
      In [3]: ses = connection.connection.session
      
      In [4]: ses.execute(SimpleStatement("create table t (k text, s text static, i int, primary key(k, i));"))
      
      In [5]: for i in range(1, 500):
         ....:     ses.execute(SimpleStatement("insert into t (k, i) values ('k', %d);" % i))
      
      In [6]: c, d = 0, 0
      
      In [7]: for e in ses.execute(SimpleStatement("select * from t", fetch_size=100)):
          if e['s'] is None:
              c += 1
          else:
              d += 1
         ....:         
      
      In [8]: c
      Out[8]: 400
      
      In [9]: d
      Out[9]: 100
      
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mmoneta Mateusz Moneta
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: