Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.6.0, 1.7.0
-
Storage R9 Sprint 36, Storage R9 Sprint 37
-
1
Description
This test is flaky and can fail with the following error:
../../src/tests/storage_local_resource_provider_tests.cpp:3167 Failed to wait 15secs for pluginRestarted
The actual error is the following:
E0802 22:13:37.265038 8216 provider.cpp:1496] Failed to reconcile resource provider b9379982-d990-4f63-8a5b-10edd4f5a1bb: Collect failed: OS Error
The root cause is that the SLRP calls ListVolumes and GetCapacity when starting up, and if the plugin container is killed when these calls are ongoing, gRPC will return an OS Error which will lead the SLRP to fail.
This flakiness will be fixed once we finish https://issues.apache.org/jira/browse/MESOS-8400.
Attachments
Attachments
Issue Links
- is blocked by
-
MESOS-8400 Handle plugin crashes gracefully in SLRP recovery.
- Reviewable