[AURORA-178] Log/observe snapshot operations - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.5.0
Component/s: Scheduler
Labels:
- newbie

Description

Currently, snapshot operations of excessive duration aren't necessarily obvious in e.g. the scheduler logs or dashboards. Since this is a potentially critical/dangerous operation (in some cases leading to zookeeper timeouts + scheduler suicide), it would be prudent to expose relevant information more readily (e.g. when the operations commence/complete, timing, etc)

From Zameer:

The doSnapshot method of LogStorage is timed with the key "scheduler_log_snapshot". These are the stats it produces:

scheduler_log_snapshot_events 19
scheduler_log_snapshot_events_per_sec 0.0
scheduler_log_snapshot_nanos_per_event 0.0
scheduler_log_snapshot_nanos_total 373115257383
scheduler_log_snapshot_nanos_total_per_sec 0.0
scheduler_log_snapshot_persist_events 19
scheduler_log_snapshot_persist_events_per_sec 0.0
scheduler_log_snapshot_persist_nanos_per_event 0.0
scheduler_log_snapshot_persist_nanos_total 339151517713
scheduler_log_snapshot_persist_nanos_total_per_sec 0.0
scheduler_log_snapshots 19

Which metric should be tracked in our dashboard?

From Bill F:

a very long snapshot might never be reflected there if a suicide happens mid-way through. The minimal fix would be to just LOG when a snapshot is about to commence.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Jonathan Boulle

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 04/Feb/14 00:26

Updated:: 16/May/14 03:43

Resolved:: 13/May/14 16:26