Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-6534

Slow inserts with collections into a single partition (Pathological GC behavior)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Duplicate
    • 2.0.11
    • None
    • None
    • dsc12-1.2.12-1.noarch.rpm
      cassandra12-1.2.12-1.noarch.rpm
      centos 6.4

    • Normal

    Description

      We noticed extremely slow insertion rates to a single partition key, using composite column with a collection value. We were not able to replicate the issue using the same schema, but with a non-colleciton value and using much larger values. During the collection insertion tests we have tons of these messages in the system.log:
      "GCInspector.java (line 119) GC for ConcurrentMarkSweep: 1287 ms for 2 collections, 1233256368 used; max is 8375238656"

      We are inserting a tiny amounts of data 32-64 bytes and seeing the issue after only a couple 10k inserts. The amount of memory being used by C*/JVM is no where near proportional to the amount data being inserted. Why is C* consuming so much memory?

      Attached is a picture of the GC under one of the pathological tests. Keep in mind we are only inserting 128KB - 256KB of data and we are almost hitting the limit of the heap.

      GC flags:
      -XX:+UseThreadPriorities
      -XX:ThreadPriorityPolicy=42
      -Xms8192M
      -Xmx8192M
      -Xmn2048M
      -XX:+HeapDumpOnOutOfMemoryError
      -Xss180k
      -XX:+UseParNewGC
      -XX:+UseConcMarkSweepGC
      -XX:+CMSParallelRemarkEnabled
      -XX:SurvivorRatio=8
      -XX:MaxTenuringThreshold=1
      -XX:CMSInitiatingOccupancyFraction=75
      -XX:+UseCMSInitiatingOccupancyOnly
      -XX:+UseTLAB

      Example schemas:

      Note: The type of collection or primitive type in the collection doesn't seem to matter.

      CREATE TABLE test.test (
      row_key text, 
      column_key uuid,
       column_value list<int>, 
      PRIMARY KEY(row_key, column_key));
      
      CREATE TABLE test.test (
      row_key text, 
      column_key uuid, 
      column_value map<text, text>, 
      PRIMARY KEY(row_key, column_key));
      

      Example inserts:

      Note: This issue is able to be replicated with extremely small inserts (a well as larger ~1KB)

      INSERT INTO test.test 
      (row_key, column_key, column_value)
      VALUES 
      ('0000000001', e0138677-7246-11e3-ac78-016ae7083d37, [0, 1, 2, 3]);
      
      INSERT INTO test.test 
      (row_key, column_key, column_value) 
      VALUES
      ('0000000022', 1ac5770a-7247-11e3-80e4-016ae7083d37, { 'a': '0123456701234567012345670',  'b': '0123456701234567012345670' });
      

      As a comparison, I was able to run the same tests with the following schema with no issue:

      Note: This test was able to run at a much faster insertion speed, for much longer and much bigger column sizes (1KB) without any GC issues.

      CREATE TABLE test.test (
      row_key text, 
      column_key uuid, 
      column_value text, 
      PRIMARY KEY(row_key, column_key) )
      

      Attachments

        1. GC_behavior.png
          115 kB
          Michael Penick

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mpenick Michael Penick
              Votes:
              3 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: