[KUDU-2907] Add directories to a directory group when there is no space left in a given directory group, but there are directories available - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.11.0
Component/s: fs
Labels:
None

Description

We've seen an issue wherein a tablet server crashed because of disk space issues. The thing is, the tablet server itself had space, but there were a number of disks that were full.

W0726 10:50:58.608566 41367 tablet_replica_mm_ops.cc:144] T d29679efebf94ccb9ed8de7daa44f3ef P 649f3f936e204410a62156f322ac6f90: failed to flush MRS: IO error: Failed to open DiskRowSet for flush: Unable to open output file for column cluster_id[string NOT NULL]: No directories available to add to d29679efebf94ccb9ed8de7daa44f3ef's directory group (11 dirs total, 4 full, 0 failed). (error 28)
F0726 10:50:58.608582 41367 tablet_replica_mm_ops.cc:145] Check failed: tablet->HasBeenStopped() FlushMRS failure is only allowed if the tablet is stopped first

Note that the error message is a red herring: the failure really came from selecting a directory to place a container, not from selecting a directory to the directory group.

There were 4 full disks; presumably the tablet had a default directory group size of 3, and all of its directories were full.

It would be nice for directory groups to be dynamically resized as needed. If getting a directory for block placement yields an ENOSPC, we should consider adding a directory to the directory group based on available space or based on the number of replicas in the remaining directories.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Andrew Wong

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 26/Jul/19 19:01

Updated:: 27/Aug/19 22:46

Resolved:: 27/Aug/19 22:46