Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.0
-
None
Description
We identified an issue that causes Traffic Router to serve up an OFFLINE cache indefinitely after a snapshot of the CRConfig. This bug will also do the inverse, where a cache that was previously set to OFFLINE will never have traffic routed to it when set back to ONLINE or REPORTED (referenced only as ONLINE henceforth).
The bug is caused by ConfigHandler.processConfig() clearing the cache locations from the NetworkNode prior to swapping out the instance of CacheRegister. When the cache locations have been cleared, but the prior CacheRegister is still in place, a race condition can occur where the CacheLocation for a given cache group from the prior config will be set on the recently cleared NetworkNode. When this happens, the List<Cache> contains the prior config's list for that cache group, which means that any host state change from/to ONLINE or OFFLINE will not be reflected. This is because when transitioning to OFFLINE the Cache drops from the CRConfig and it will reappear when set to ONLINE. Contrast this with ONLINE to ADMIN_DOWN, the Cache remains in the CRConfig, so we are simply using the status to determine whether the cache is available and the software works as designed.
This is due to the way we use lazy loading to associate network ranges within the CZF with CacheLocations within a given NetworkNode representing that section of the CZF. In TrafficRouter, during cache selection, if we have a hit in the coverage zone file but the CacheLocation is uninitialized, we obtain the CacheLocation from CacheRegister and set it for that specific NetworkNode. If our NetworkNode is cleared but our CacheRegister has yet to be swapped, we will set the NetworkNode to the old CacheLocation and as mentioned, which will have a reference to the prior List<Cache>, denying anyone the opportunity to populate that NetworkNode with the new CacheLocation and new List<Cache>.
Attachments
Issue Links
- links to