Description
CQSI maintains a client-side metadata cache, i.e., schemas, tables, and functions, that evicts the last recently used table entries when the cache size grows beyond the configured size.
Each time a Phoenix connection is created, the client-side metadata cache maintained by the CQSI object creating this connection is cloned for the connection. Thus, we have two levels of caches, one at the Phoenix connection level and the other at the CQSI level.
When a Phoenix client needs to update the client side cache, it updates both caches (on the connection object and on the CQSI object). The Phoenix client attempts to retrieve a table from the connection level cache. If this table is not there then the Phoenix client does not check the CQSI level cache, instead it retrieves the object from the server and finally updates both the connection and CQSI level cache.
PMetaDataCache provides caching for tables, schemas and functions but it maintains separate caches internally, one cache for each type of metadata. The cache for the tables is actually a cache of PTableRef objects. PTableRef holds a reference to the table object as well as the estimated size of the table object, the create time, last access time, and resolved time. The create time is set to the last access time value provided when the PTableRef object is inserted into the cache. The resolved time is also provided when the PTableRef object is inserted into the cache. Both the created time and resolved time are final fields (i.e., they are not updated). PTableRef provide a setter method to update the last access time. PMetaDataCache updates the last access time whenever the table is retrieved from the cache. The LRU eviction policy is implemented using the last access time. The eviction policy is not implemented for schemas and functions. The configuration parameter for the frequency of updating cache is phoenix.default.update.cache.frequency. This can be defined at the cluster or table level. When it is set to zero, it means cache would not be used.
Obviously the eviction of the cache is to limit the memory consumed by the cache. The expected behavior is that when a table is removed from the cache, the table (PTableImpl) object is also garbage collected. However, this does not really happen because multiple caches make references to the same object and each cache maintains its own table refs and thus access times. This means that the access time for the same table may differ from one cache to another; and when one cache can evict an object, another cache will hold on the same object.
Although individual caches implements the LRU eviction policy, the overall memory eviction policy for the actual table objects is more like age based cache. If a table is frequently accessed from the connection level caches, the last access time maintained by the corresponding table ref objects for this table will be updated. However, these updates on the access times will not be visible to the CQSI level cache. The table refs in the CQSI level cache have the same create time and access time.
Since whenever an object is inserted into the local cache of a connection object, it is also inserted the cache on the CSQI object, the CQSI level cache will grow faster than the caches on the connection objects. When the cache reaches its maximum size, the newly inserted tables will result in evicting one of the existing tables in the cache. Since the access time of these tables are not updated on the CQSI level cache, it is likely that the table that has stayed in the cache for the longest period of time will be evicted (regardless of whether the same table is frequently accessed via the connection level caches). This obviously defeats the purpose of an LRU cache.
Another problem with the current cache is related to the choice of its internal data structures and its eviction implementation. The table refs in the cache are maintained in a hash map which maps a table key (which is pair of a tenant id and table name) to a table ref. When the size of a cache (the total byte size of the table objects referred by the cache) reaches its configured limit, how much overage adding a new table would cause is computed. Then all the table refs in this cache are cloned into a priority queue as well as a new cache. This queue uses the access time to determine the order of its elements (i.e., table refs). The table refs that should not be evicted are removed from the queue, which leaves the table refs to be evicted in the queue. Finally, the table refs left in the queue are removed from the new cache. The new cache replaces the old one. It clear that this is an expensive operation in terms of memory allocations and CPU time. The bad news is that when the cache reaches its limit, every insertion would likely cause an eviction and this expensive operation will be repeated for each such insertion.
Since Phoenix connections are supposed to be short lived, maintaining a separate cache for each connection object and especially cloning entire cache content (and then pruning the entries belonging to other tenants when the connection is a tenant specific connection) are not justified. The cost of such a clone operation by itself would offset the gain of not accessing the CQSI level cache as the number of such accesses per connection should be small because of short lived Phoenix connections.
Also the impact of Phoenix connection leaks, the connections that are not closed by applications and simply long lived connections will be exacerbated since these connections will have references to the large set of table objects.