In cloud environments it is possible that a same database instance is used as the long running metadata persistence layer and multiple HMS access this database. These HMS instances could be running the same time or in case of transient workloads come up on an on-demand basis. HMS is used by multiple projects in the Hadoop eco-system as the de-facto metadata keeper for various SQL engines on the cluster. Currently, there is no way to uniquely identify the database instance which is backing the HMS. For example, if there are two instances of HMS running on top of same metastore DB, there is no way to identify that data received from both the metastore clients is coming from the same database. Similarly, if there in case of transient workloads multiple HMS services come up and go, a external application which is fetching data from a HMS has no way to identify that these multiple instances of HMS are in fact returning the same data.
We can potentially use the combination of javax.jdo.option.ConnectionURL, javax.jdo.option.ConnectionDriverName configuration of each HMS instance but this is approach may not be very robust. If the database is migrated to another server for some reason the ConnectionURL can change. Having a UUID in the metastore DB which can be queried using a Thrift API can help solve this problem. This way any application talking to multiple HMS instances can recognize if the data is coming the same backing database.
- is related to
HIVE-16711 Remove property_id column from metastore_db_properties table