[AMBARI-16830] Desired Configuration Cache Expiration Caused 10,000's of Database Hits In Large Deployments - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 2.2.2
Fix Version/s: 2.4.0
Component/s: ambari-server
Labels:
None

Description

In large deployments where the number of hosts * the number of components is large (10,000 for example), then the ConfigHelper.isStale() method could make 10,000's of database queries every minute.

Consider a 3-minute trace:

server.persistence.properties.eclipselink.profiler=PerformanceMonitor

Time = 3 minutes

Counter:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null    11,716

Timer:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null    80,520,541,000
Timer:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null:ObjectBuilding    19,741,257,000
Timer:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null:QueryPreparation    414,000
Timer:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null:RowFetch    6,032,673,000
Timer:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null:SqlGeneration    79,000
Timer:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null:SqlPrepare    232,532,000
Timer:ReadAllQuery:org.apache.ambari.server.orm.entities.ClusterConfigMappingEntity:null:StatementExecute    33,624,662,000

The ClusterConfigMappingEntity:null is requested over 10,000 times. If this value exceeds the cache of stale configs (or even if it doesn't) this causes a massive performance delay in the Jetty threads since the database is being hammered and other PropertyProviders must wait until it's done.

Setting the server.cache.isStale.expiration value to 28800 improves the behavior of the system
- Ambari goes from totally unsuable to usable
- Startup is still an issue as the code still has to make 10,000's of calls, but those flatten out after the cache is populated. So, during startup, it's unresponsive.
- After startup, you can use Ambari to send commands and browse around without delay
- If you change a config, however, the problem returns as the cache is emptied and we make 10,000 more calls. This causes Ambari to be unresponsive until the cache is repopulated

There are a ton of threads stuck at:

"qtp-ambari-client-275" prio=10 tid=0x00007f9de801b800 nid=0x6735 waiting for monitor entry [0x00007f9dd66e3000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.apache.ambari.server.controller.internal.AbstractProviderModule.checkInit(AbstractProviderModule.java:805)
	- waiting to lock <0x00007fa0744cc3b0> (a org.apache.ambari.server.controller.internal.DefaultProviderModule)
	at org.apache.ambari.server.controller.internal.AbstractProviderModule.getMetricsServiceType(AbstractProviderModule.java:275)

They're all blocked by qtp-ambari-client-247:

"qtp-ambari-client-247" prio=10 tid=0x00007f9dd8001000 nid=0x5915 runnable [0x00007f9ddd0c2000]
   java.lang.Thread.State: RUNNABLE
	at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2961)
	at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2159)
	at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1964)
	at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3316)
	at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:463)
	at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3040)
	at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2288)
	at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2681)
	at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2551)
	- locked <0x00007fa075265510> (a com.mysql.jdbc.JDBC4Connection)
	at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861)
	- locked <0x00007fa075265510> (a com.mysql.jdbc.JDBC4Connection)
	at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1962)
	- locked <0x00007fa075265510> (a com.mysql.jdbc.JDBC4Connection)
	at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeQuery(NewProxyPreparedStatement.java:353)
	at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeSelect(DatabaseAccessor.java:1009)
	at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.basicExecuteCall(DatabaseAccessor.java:644)
	at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeCall(DatabaseAccessor.java:560)
	at org.eclipse.persistence.internal.sessions.AbstractSession.basicExecuteCall(AbstractSession.java:2055)
	at org.eclipse.persistence.sessions.server.ServerSession.executeCall(ServerSession.java:570)
	at org.eclipse.persistence.internal.queries.DatasourceCallQueryMechanism.executeCall(DatasourceCallQueryMechanism.java:242)
	at org.eclipse.persistence.internal.queries.DatasourceCallQueryMechanism.executeCall(DatasourceCallQueryMechanism.java:228)
	at org.eclipse.persistence.internal.queries.DatasourceCallQueryMechanism.executeSelectCall(DatasourceCallQueryMechanism.java:299)
	at org.eclipse.persistence.internal.queries.DatasourceCallQueryMechanism.selectAllRows(DatasourceCallQueryMechanism.java:694)
	at org.eclipse.persistence.internal.queries.ExpressionQueryMechanism.selectAllRowsFromTable(ExpressionQueryMechanism.java:2740)
	at org.eclipse.persistence.internal.queries.ExpressionQueryMechanism.selectAllRows(ExpressionQueryMechanism.java:2693)
	at org.eclipse.persistence.queries.ReadAllQuery.executeObjectLevelReadQuery(ReadAllQuery.java:559)
	at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeDatabaseQuery(ObjectLevelReadQuery.java:1175)
	at org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:904)
	at org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:1134)
	at org.eclipse.persistence.queries.ReadAllQuery.execute(ReadAllQuery.java:460)
	at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeInUnitOfWork(ObjectLevelReadQuery.java:1222)
	at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalExecuteQuery(UnitOfWorkImpl.java:2896)
	at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1857)
	at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1839)
	at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1804)
	at org.eclipse.persistence.internal.jpa.QueryImpl.executeReadQuery(QueryImpl.java:258)
	at org.eclipse.persistence.internal.jpa.QueryImpl.getResultList(QueryImpl.java:473)
	at org.apache.ambari.server.orm.dao.DaoUtils.selectList(DaoUtils.java:62)
	at org.apache.ambari.server.orm.dao.ClusterDAO.getClusterConfigMappingEntitiesByCluster(ClusterDAO.java:240)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

AMBARI-16830.patch
20/Oct/16 17:07
80 kB
Jonathan Hurley

Issue Links

links to

Reviewboard

Desired Configuration Cache Expiration Caused 10,000's of Database Hits In Large Deployments

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates