Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-722

snapshot performance issues

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.6.0
    • Scheduler
    • None
    • Aurora Q3 Sprint 2, Aurora Q3 Sprint 3, Aurora Q4 Sprint 1

    Description

      In one of our larger production clusters we're seeing issues with snapshot performance that cause the scheduler to failover before completing a snapshot.

      For background, the scheduler writes a compressed (when -deflate_snapshots is enabled), binary-encoded Snapshot (from api.thrift) to the mesos replicated log every hour (or -dlog_snapshot_interval). This snapshot represents most of the scheduler's heap usage, including the configuration for all tasks running in the cluster.

      Add appropriate instrumentation to the snapshot routine and patch any obvious performance bottlenecks.

      Attachments

        Issue Links

          Activity

            People

              kevints Kevin Sweeney
              kevints Kevin Sweeney
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: