Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-10117

FD Leak with DTCS

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Not A Problem
    • None
    • None
    • Normal

    Description

      Using 2.1-HEAD, specifically commit 972ae147247a, I am experiencing issues in a one node test with DTCS. These seem separate from CASSANDRA-9882.

      Using an ec2 i2.2xlarge node with all default settings and the following schema:

      CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
      
        CREATE TABLE tab (
            key uuid,
            year int,
            month int,
            day int,
            c0 blob,
            c1 blob,
            c2 blob,
            c3 blob,
            c4 blob,
            c5 blob,
            c6 blob,
            c7 blob,
            c8 blob,
            PRIMARY KEY ((year, month, day), key)
        ) WITH compaction = {'class': 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'};
      

      I loaded 4500M rows via stress, which totaled ~1.2TB. I then performed a few mixed workloads via stress, which were 50% insert, 50% the following read: Select * from tab where year = ? and month = ? and day = ? limit 1000.

      This was done to reproduce a separate issue for a user. I then received reports from the user that they were experiencing open FD counts per sstable in the thousands. With absolutely no load on my cluster, I was finding that any sstable with open FDs had between 243 and 245 open FDs. I then started a stress process performing the same read/write workload as before. I was then immediately seeing FD counts as high as 6615 for a single sstable.

      I was determining FD counts per sstable with the following [example] call:
      lsof | grep '16119-Data.db' | wc -l.

      I still have this cluster running, for you to examine. System.log is attached.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            benedict Benedict Elliott Smith Assign to me
            philipthompson Philip Thompson
            Benedict Elliott Smith
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment