In large deployments where the number of hosts * the number of components is large (10,000 for example), then the ConfigHelper.isStale() method could make 10,000's of database queries every minute.
Consider a 3-minute trace:
The ClusterConfigMappingEntity:null is requested over 10,000 times. If this value exceeds the cache of stale configs (or even if it doesn't) this causes a massive performance delay in the Jetty threads since the database is being hammered and other PropertyProviders must wait until it's done.
- Setting the server.cache.isStale.expiration value to 28800 improves the behavior of the system
- Ambari goes from totally unsuable to usable
- Startup is still an issue as the code still has to make 10,000's of calls, but those flatten out after the cache is populated. So, during startup, it's unresponsive.
- After startup, you can use Ambari to send commands and browse around without delay
- If you change a config, however, the problem returns as the cache is emptied and we make 10,000 more calls. This causes Ambari to be unresponsive until the cache is repopulated
There are a ton of threads stuck at:
They're all blocked by qtp-ambari-client-247: