Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Not A Problem
-
1.7.0
-
None
-
None
-
Mesos Version 1.7.0
JDK 8.0
Description
We manage an Apache Mesos cluster version 1.7.0. We have written a framework in Java that schedules tasks to Mesos master at a rate of 300 TPS. Everything works fine for almost 24 hours but then outstanding offers accumulate & saturate within 15 minutes. Outstanding offers aren't reclaimed by Mesos master. We observe "RescindOffer" messages in verbose (GLOG v=3) framework logs but outstanding offers don't reduce. New resources aren't offered to framework when outstanding offers saturate. We have to restart the scheduler to reset outstanding offers to zero.
Any suggestions to debug this issue are welcome.