Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-9798

Cassandra seems to have deadlocks during flush operations

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Fixed
    • 2.1.x
    • None
    • None
    • Normal

    Description

      Hi,
      We noticed some problem with dropped mutationstages. Usually on one random node there is a situation that:
      MutationStage "active" is full, "pending" is increasing "completed" is stalled.
      MemtableFlushWriter "active" 6, pending: 25 completed: stalled
      MemtablePostFlush "active" is 1, pending 29 completed: stalled

      after a some time (30s-10min) pending mutations are dropped and everything is working.
      When it happened:
      1. Cpu idle is ~95%
      2. no gc long pauses or more activity.
      3. memory usage 3.5GB form 8GB
      4. only writes is processed by cassandra
      5. when LOAD > 400GB/node problems appeared
      6. cassandra 2.1.6

      There is gap in logs:

      INFO  08:47:01 Timed out replaying hints to /192.168.100.83; aborting (0 delivered)
      INFO  08:47:01 Enqueuing flush of hints: 7870567 (0%) on-heap, 0 (0%) off-heap
      INFO  08:47:30 Enqueuing flush of table1: 95301807 (4%) on-heap, 0 (0%) off-heap
      INFO  08:47:31 Enqueuing flush of table1: 60462632 (3%) on-heap, 0 (0%) off-heap
      INFO  08:47:31 Enqueuing flush of table2: 76973746 (4%) on-heap, 0 (0%) off-heap
      INFO  08:47:31 Enqueuing flush of table1: 84290135 (4%) on-heap, 0 (0%) off-heap
      INFO  08:47:32 Enqueuing flush of table3: 56926652 (3%) on-heap, 0 (0%) off-heap
      INFO  08:47:32 Enqueuing flush of table1: 85124218 (4%) on-heap, 0 (0%) off-heap
      INFO  08:47:33 Enqueuing flush of table2: 95663415 (4%) on-heap, 0 (0%) off-heap
      INFO  08:47:58 CompactionManager                 2        39
      INFO  08:47:58 Writing Memtable-table2@1767938721(13843064 serialized bytes, 162359 ops, 4%/0% of on/off-heap l
      imit)
      INFO  08:47:58 Writing Memtable-hints@1433125911(478703 serialized bytes, 424 ops, 0%/0% of on/off-heap limit)
      INFO  08:47:58 Writing Memtable-table2@1318583275(11783615 serialized bytes, 137378 ops, 4%/0% of on/off-heap l
      imit)
      INFO  08:47:58 Enqueuing flush of compactions_in_progress: 969 (0%) on-heap, 0 (0%) off-heap
      INFO  08:47:58 Writing Memtable-table1@541175113(17221327 serialized bytes, 180792 ops, 4%/0% of on/off-heap
       limit)
      INFO  08:47:58 Writing Memtable-table1@1361154669(27138519 serialized bytes, 273472 ops, 6%/0% of on/off-hea
      p limit)
      
      INFO  08:48:03 2176 MUTATION messages dropped in last 5000ms
      

      use case:
      100% write - 100Mb/s, couples of CF ~10column each. max cell size 100B
      CMS and G1GC tested - no difference

      Attachments

        1. cassandra.2.1.8.log
          9 kB
          Łukasz Mrożkiewicz
        2. cassandra.log
          110 kB
          Łukasz Mrożkiewicz
        3. cassandra.yaml
          37 kB
          Łukasz Mrożkiewicz
        4. cassandra.yaml
          37 kB
          Łukasz Mrożkiewicz
        5. gc.log.0.current
          219 kB
          Łukasz Mrożkiewicz
        6. stack.txt
          1.07 MB
          Łukasz Mrożkiewicz
        7. topHbn1.txt
          221 kB
          Łukasz Mrożkiewicz

        Activity

          People

            Unassigned Unassigned
            mrozek Łukasz Mrożkiewicz
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: