Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11814

some queries are very slow on a small partition

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Low
    • Resolution: Unresolved
    • None
    • Legacy/CQL
    • os: ubuntu server 14.04
      cassandra: 2.2.5 (apt-get installed from datastax repo)
      server: ec2 m4.2xlarge

    • Low

    Description

      we have a cassandra cluster, which has 16 nodes(m4.2xlarge) on aws.
      The total data size is 2.5TB now. There is a very large table named inbox_musical, the schema is like this:

      CREATE TABLE feeds.inbox_musical (
          owner_id bigint,
          activity_id bigint,
          insert_time timestamp,
          PRIMARY KEY (owner_id, activity_id)
      ) WITH CLUSTERING ORDER BY (activity_id DESC)
          AND bloom_filter_fp_chance = 0.1
          AND caching = '{"keys":"ALL", "rows_per_partition":"20"}'
          AND comment = ''
          AND compaction = {'class': 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
          AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
          AND dclocal_read_repair_chance = 0.1
          AND default_time_to_live = 15552000
          AND gc_grace_seconds = 864000
          AND max_index_interval = 2048
          AND memtable_flush_period_in_ms = 0
          AND min_index_interval = 128
          AND read_repair_chance = 0.0
          AND speculative_retry = '99.0PERCENTILE';
      

      owner_id is the partition key, and activity_id is clustering key.

      95% query can return within 10ms. but a few queries read timeout ( > 5s ).

      the slow queries is like this:

      SELECT activity_id FROM inbox_musical WHERE owner_id=5582322 ORDER BY activity_id DESC LIMIT 100;
      

      there are only 230 rows in this partition. but the query takes seconds to complete.

      the query tracing:

       activity                                                                                                                        | timestamp                  | source       | source_elapsed
      ---------------------------------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------
                                                                                                                    Execute CQL3 query | 2016-05-17 11:21:26.986000 | 172.31.8.188 |              0
                                                    READ message received from /172.31.8.188 [MessagingService-Incoming-/172.31.8.188] | 2016-05-17 11:21:26.985000 | 172.31.15.14 |             13
                                                               Executing single-partition query on inbox_musical [SharedPool-Worker-4] | 2016-05-17 11:21:26.985000 | 172.31.15.14 |            115
                                                    READ message received from /172.31.8.188 [MessagingService-Incoming-/172.31.8.188] | 2016-05-17 11:21:26.986000 | 172.31.4.135 |             13
                                                                                    Acquiring sstable references [SharedPool-Worker-4] | 2016-05-17 11:21:26.986000 | 172.31.15.14 |            137
                                                               Executing single-partition query on inbox_musical [SharedPool-Worker-1] | 2016-05-17 11:21:26.986000 | 172.31.4.135 |            113
                                                                                     Merging memtable tombstones [SharedPool-Worker-4] | 2016-05-17 11:21:26.986000 | 172.31.15.14 |            161
       Parsing SELECT activity_id FROM inbox_musical WHERE owner_id=5582322 ORDER BY activity_id DESC LIMIT 100; [SharedPool-Worker-3] | 2016-05-17 11:21:26.986000 | 172.31.8.188 |             66
                                                                                    Acquiring sstable references [SharedPool-Worker-1] | 2016-05-17 11:21:26.986000 | 172.31.4.135 |            136
                                                                                 Key cache hit for sstable 18948 [SharedPool-Worker-4] | 2016-05-17 11:21:26.986000 | 172.31.15.14 |            199
                                                                                             Preparing statement [SharedPool-Worker-3] | 2016-05-17 11:21:26.986000 | 172.31.8.188 |             99
                                                                                     Merging memtable tombstones [SharedPool-Worker-1] | 2016-05-17 11:21:26.986000 | 172.31.4.135 |            155
                                                                     Seeking to partition beginning in data file [SharedPool-Worker-4] | 2016-05-17 11:21:26.986000 | 172.31.15.14 |            205
                                                                                            Read-repair DC_LOCAL [SharedPool-Worker-3] | 2016-05-17 11:21:26.986000 | 172.31.8.188 |            142
                                                                      Bloom filter allows skipping sstable 20159 [SharedPool-Worker-1] | 2016-05-17 11:21:26.986000 | 172.31.4.135 |            179
                                                                                 Key cache hit for sstable 18924 [SharedPool-Worker-4] | 2016-05-17 11:21:26.986000 | 172.31.15.14 |            459
                                                                                 reading data from /172.31.4.135 [SharedPool-Worker-3] | 2016-05-17 11:21:26.986000 | 172.31.8.188 |            160
                                                                                 Key cache hit for sstable 20134 [SharedPool-Worker-1] | 2016-05-17 11:21:26.986000 | 172.31.4.135 |            195
                                                                     Seeking to partition beginning in data file [SharedPool-Worker-4] | 2016-05-17 11:21:26.986000 | 172.31.15.14 |            468
                                                                                 reading data from /172.31.15.14 [SharedPool-Worker-3] | 2016-05-17 11:21:26.986000 | 172.31.8.188 |            212
                                                                     Seeking to partition beginning in data file [SharedPool-Worker-1] | 2016-05-17 11:21:26.986001 | 172.31.4.135 |            202
                                                                                 Key cache hit for sstable 18547 [SharedPool-Worker-4] | 2016-05-17 11:21:26.986001 | 172.31.15.14 |            782
                                                       Sending READ message to /172.31.4.135 [MessagingService-Outgoing-/172.31.4.135] | 2016-05-17 11:21:26.986001 | 172.31.8.188 |            249
                                                                                 Key cache hit for sstable 19782 [SharedPool-Worker-1] | 2016-05-17 11:21:26.986001 | 172.31.4.135 |            376
                                                                     Seeking to partition beginning in data file [SharedPool-Worker-4] | 2016-05-17 11:21:26.986001 | 172.31.15.14 |            792
                                                       Sending READ message to /172.31.15.14 [MessagingService-Outgoing-/172.31.15.14] | 2016-05-17 11:21:26.986001 | 172.31.8.188 |            260
                                                                     Seeking to partition beginning in data file [SharedPool-Worker-1] | 2016-05-17 11:21:26.986001 | 172.31.4.135 |            388
                                       Skipped 0/3 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-4] | 2016-05-17 11:21:26.986001 | 172.31.15.14 |            963
                                                                                 Key cache hit for sstable 16500 [SharedPool-Worker-1] | 2016-05-17 11:21:26.986001 | 172.31.4.135 |            477
                                                                      Merging data from memtables and 3 sstables [SharedPool-Worker-4] | 2016-05-17 11:21:26.986001 | 172.31.15.14 |            973
                                                                     Seeking to partition beginning in data file [SharedPool-Worker-1] | 2016-05-17 11:21:26.986001 | 172.31.4.135 |            484
                                       Skipped 0/4 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-1] | 2016-05-17 11:21:26.986001 | 172.31.4.135 |            573
                                                                      Merging data from memtables and 3 sstables [SharedPool-Worker-1] | 2016-05-17 11:21:26.986001 | 172.31.4.135 |            582
                                                                             Read 101 live and 0 tombstone cells [SharedPool-Worker-1] | 2016-05-17 11:21:31.805000 | 172.31.4.135 |         819660
                                                                             Enqueuing response to /172.31.8.188 [SharedPool-Worker-1] | 2016-05-17 11:21:31.831000 | 172.31.4.135 |         844972
                                           Sending REQUEST_RESPONSE message to /172.31.8.188 [MessagingService-Outgoing-/172.31.8.188] | 2016-05-17 11:21:31.868000 | 172.31.4.135 |         882380
                                        REQUEST_RESPONSE message received from /172.31.4.135 [MessagingService-Incoming-/172.31.4.135] | 2016-05-17 11:21:31.962000 | 172.31.8.188 |         976119
                                                                          Processing response from /172.31.4.135 [SharedPool-Worker-2] | 2016-05-17 11:21:31.962000 | 172.31.8.188 |         976166
                                                                                                                      Request c
                                                                                                                  omplete | 2016-05-17 11:21:32.014093 | 172.31.8.188 |          28093
      

      I used jstack to get some stacks when c* is execute query like above. and found it seems c* take a long time do the following methods:

      "SharedPool-Worker-5" #221 daemon prio=5 os_prio=0 tid=0x00007f177517a000 nid=0x8b6 runnable [0x00007f1471e5b000]
         java.lang.Thread.State: RUNNABLE
          at org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:689)
          at org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:649)
          at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:170)
          at org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:142)
          at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:239)
          at org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:490)
          at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
          at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:185)
          at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:157)
          at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
          at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
          at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
          at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
          at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
          at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:264)
          at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:109)
          at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:82)
          at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69)
          at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:317)
          at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:61)
          at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:2011)
          at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1815)
          at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:360)
          at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85)
      

      I executed these query when server was on very low load.

      from the tracing it seems that something is very slow between 'Merging data from memtables and 3 sstables' and 'Read 101 live and 0 tombstone cells'

      Attachments

        Activity

          People

            Unassigned Unassigned
            zjumty Terry Ma
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: