Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
1.6.0, 1.8.0
-
None
-
Mesosphere Sprint 78
-
3
Description
When running in the agent the resource provider manager persists its state into the agent's state. The agent uses a LevelDB state which protects against concurrent access. The way we modelled LevelDB an fetch when a lock is present leads to a failed Future result. When the resource provider manager encounters a failed recovery it emits a fatal error, e.g.,
11:48:26 F0425 11:48:26.650568 26819 manager.cpp:254] Failed to recover resource provider manager registry: Failed: IO error: lock /tmp/ParentChildContainerTypeAndContentType_AgentContainerAPITest_RecoverNestedContainer_10_HXbQCK/meta/slaves/6645885c-050a-4518-b896-a20b3e72a070-S0/resource_provider_registry/LOCK: already held by process 11:48:26 *** Check failure stack trace: ***
We should not fail hard for such recoverable failure scenarios.
Attachments
Attachments
Issue Links
- is broken by
-
MESOS-8735 Implement recovery for resource provider manager registrar
- Resolved