Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-9798

Cassandra seems to have deadlocks during flush operations

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 2.1.x
    • None
    • None
    • Normal

    Description

      Hi,
      We noticed some problem with dropped mutationstages. Usually on one random node there is a situation that:
      MutationStage "active" is full, "pending" is increasing "completed" is stalled.
      MemtableFlushWriter "active" 6, pending: 25 completed: stalled
      MemtablePostFlush "active" is 1, pending 29 completed: stalled

      after a some time (30s-10min) pending mutations are dropped and everything is working.
      When it happened:
      1. Cpu idle is ~95%
      2. no gc long pauses or more activity.
      3. memory usage 3.5GB form 8GB
      4. only writes is processed by cassandra
      5. when LOAD > 400GB/node problems appeared
      6. cassandra 2.1.6

      There is gap in logs:

      INFO  08:47:01 Timed out replaying hints to /192.168.100.83; aborting (0 delivered)
      INFO  08:47:01 Enqueuing flush of hints: 7870567 (0%) on-heap, 0 (0%) off-heap
      INFO  08:47:30 Enqueuing flush of table1: 95301807 (4%) on-heap, 0 (0%) off-heap
      INFO  08:47:31 Enqueuing flush of table1: 60462632 (3%) on-heap, 0 (0%) off-heap
      INFO  08:47:31 Enqueuing flush of table2: 76973746 (4%) on-heap, 0 (0%) off-heap
      INFO  08:47:31 Enqueuing flush of table1: 84290135 (4%) on-heap, 0 (0%) off-heap
      INFO  08:47:32 Enqueuing flush of table3: 56926652 (3%) on-heap, 0 (0%) off-heap
      INFO  08:47:32 Enqueuing flush of table1: 85124218 (4%) on-heap, 0 (0%) off-heap
      INFO  08:47:33 Enqueuing flush of table2: 95663415 (4%) on-heap, 0 (0%) off-heap
      INFO  08:47:58 CompactionManager                 2        39
      INFO  08:47:58 Writing Memtable-table2@1767938721(13843064 serialized bytes, 162359 ops, 4%/0% of on/off-heap l
      imit)
      INFO  08:47:58 Writing Memtable-hints@1433125911(478703 serialized bytes, 424 ops, 0%/0% of on/off-heap limit)
      INFO  08:47:58 Writing Memtable-table2@1318583275(11783615 serialized bytes, 137378 ops, 4%/0% of on/off-heap l
      imit)
      INFO  08:47:58 Enqueuing flush of compactions_in_progress: 969 (0%) on-heap, 0 (0%) off-heap
      INFO  08:47:58 Writing Memtable-table1@541175113(17221327 serialized bytes, 180792 ops, 4%/0% of on/off-heap
       limit)
      INFO  08:47:58 Writing Memtable-table1@1361154669(27138519 serialized bytes, 273472 ops, 6%/0% of on/off-hea
      p limit)
      
      INFO  08:48:03 2176 MUTATION messages dropped in last 5000ms
      

      use case:
      100% write - 100Mb/s, couples of CF ~10column each. max cell size 100B
      CMS and G1GC tested - no difference

      Attachments

        1. cassandra.yaml
          37 kB
          Łukasz Mrożkiewicz
        2. gc.log.0.current
          219 kB
          Łukasz Mrożkiewicz
        3. cassandra.log
          110 kB
          Łukasz Mrożkiewicz
        4. cassandra.2.1.8.log
          9 kB
          Łukasz Mrożkiewicz
        5. cassandra.yaml
          37 kB
          Łukasz Mrożkiewicz
        6. stack.txt
          1.07 MB
          Łukasz Mrożkiewicz
        7. topHbn1.txt
          221 kB
          Łukasz Mrożkiewicz

        Activity

          People

            Unassigned Unassigned
            mrozek Łukasz Mrożkiewicz
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: