[AURORA-722] snapshot performance issues - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.6.0
Component/s: Scheduler
Labels:
None

Sprint:
Aurora Q3 Sprint 2, Aurora Q3 Sprint 3, Aurora Q4 Sprint 1

Description

In one of our larger production clusters we're seeing issues with snapshot performance that cause the scheduler to failover before completing a snapshot.

For background, the scheduler writes a compressed (when -deflate_snapshots is enabled), binary-encoded Snapshot (from api.thrift) to the mesos replicated log every hour (or -dlog_snapshot_interval). This snapshot represents most of the scheduler's heap usage, including the configuration for all tasks running in the cluster.

Add appropriate instrumentation to the snapshot routine and patch any obvious performance bottlenecks.

Attachments

Issue Links

relates to

AURORA-74 Write snapshots and backups in a way that requires less memory

Open

Activity

People

Assignee:: Kevin Sweeney

Reporter:: Kevin Sweeney

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 17/Sep/14 18:03

Updated:: 17/Apr/15 22:27

Resolved:: 16/Oct/14 00:07