Thanks for Flavio's comments.
> For bookkeeper, we need to access ledger metadata both from clients and bookies, right?
yes. the implementation of metastore based ledger manager should be two tasks, one is client, the other one is server. the server part would depends on the client part, because client part handles how to store ledger metadata, server part handles how to garbage collect ledgers.
> Is this correct? If so, the plugable interface will allow the use of different repositories for the metadata part, but we will still rely upon zookeeper to monitor node availability.
yes. we still use zookeeper for node availability, while moving metadata operations to different storage.
> In the definition of the compare-and-swap operation, the comparison is performed using the key and value itself. This might be expensive, so I was wondering if it is a better approach to use versions instead. The drawback is relying upon a backend that provides versioned data. It seems fine for me, though.
in the proposal, the comparison operation is just applied in a cell (located by (key,family,qualifier), while the set operation can be applied on multiple cells.
for example, suppose we have two columns, one column is data column, which is used to store actual data; while the other one is version column, which is used to store a incremented number. the initial value is (oldData, 0). when we want to update data column, we executed by CAS (key, 0, key, (newData, 1)). the comparison is applied only on version column, is not on data column, which is not expensive.
As my knowledge, zk#setData provides a conditional set over version, the set operation succeeds only when the given matches the version of the znode, which is a kind of CAS. CAS would be better to support more K/V stores.
> Related to the previous comment, it might be a better idea to state somewhere what properties we require from the backend store.
I think I have put them in section 3, the operations required by a MetaStore.
> I'm not entirely sure I understand the implementation of leader election in 5.1. What happens if a hub is incorrectly suspected of crashing and it loses ownership over a topic? Does it find out via session expiration? Also, I suppose that if the hub has crashed but the list of hubs hasn't changed, then multiple iterations of 1 may have to happen.
>> I suppose that if the hub has crashed but the list of hubs hasn't changed, then multiple iterations of 1 may have to happen.
doesn't this case exit using zookeeper? it seems that there is still a gap between hub crashed and znode deletion (session expired). in metastore-based topic manager, this gap becomes hub crashed and other hub server got notified about hub crashed.
>> What happens if a hub is incorrectly suspected of crashing and it loses ownership over a topic?
if a hub server is not crashed, other hub server would not receive the notification from zookeeper about that hub crashed (can zookeeper guarantee it?). so ownership would not change, since other hub server still see a same zxid about that hub server.