[CASSANDRA-1752] repair leaving FDs unclosed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 0.6.9
Component/s: None
Labels:
None

Severity:
Normal

Description

"We noticed that after a `nodetool repair` was ran, several of our nodes reported high disk usage; – even one node hit 100% disk usage. After a restart of that node, disk usage drop instantly by 80 gigabytes – well that was confusing, but we quickly formed the theory that Cassandra must of been holding open references to deleted file descriptors.

"Later, i found this node as an example, it is using about 8-10 gigabytes more than it should be – 118 gigabytes reported by df, yet du reports only 106 gigabytes in the cassandra directory (nothing else on the mahcine). As you can see from the lsof listing, it is holding open FDs to files that no longer exist on the filesystem, and there are no open streams or as far as I can tell other reasons for the deleted sstable to be open.

"This seems to be related to running a repair, as we haven't seen it in any other situations before."

A quick check of FileStreamTask shows that the obvious base is covered:

        finally
        {
            try
            {
                raf.close();
            }
            catch (IOException e)
            {
                throw new AssertionError(e);
            }
        }

So it seems that either the transfer loop is never finishing to get to that finally block (in which case why isn't it showing up in outbound streams?) or something else is the problem.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

1752-0.6.txt
29/Nov/10 23:58
12 kB
Tom Hobbs
1752-0.6-v2.txt
01/Dec/10 04:23
11 kB
Tom Hobbs
1752-0.6-v3.txt
01/Dec/10 19:34
13 kB
Tom Hobbs

Activity

People

Assignee:: Tom Hobbs

Reporter:: Jonathan Ellis

Authors:: Tom Hobbs

Reviewers:: Matthew F. Dennis

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 17/Nov/10 09:15

Updated:: 16/Apr/19 09:33

Resolved:: 01/Dec/10 22:42