1) WRT the concern on not depending on another channel, i went down this path since it looked like there was some consensus when i started. What alternative design do you have in mind ?
2) WRT change in memory/file channel breaking the Spillable channel: Could you expand a bit ? I am not familiar with replay order issue and how it can impact. I dont think there is any intrinsic assumption being made wrt to any specific channel's behavior. Just to be doubly sure, i made sure not to rely on a single type of overflow channel in all the tests. The only material dependency (as far as I can tell) that Spillable Channel has on the overflow is the interface level guarantee that is expected from all channels: that order is maintained in case of single source/sink.
Do you see any other assumptions/dependencies hiding there ?
I am sorry, I was not part of the initial discussions - so I was not aware of the consensus aspect. What I am saying is that being dependent on another channel creates an undesired strong coupling between this channel and the other channels. An if there are unit tests in this channel which can break if one of the other channels' behavior is changed, then it is not something that is acceptable. If you look at all our other components, none of them have a dependence on each other (except the RPCClients - that is because the sinks are just glorified RPCClients).
The reason I would not agree with even the single source/sink replay order is that our interfaces do not really enforce this. This is not really even enforced anywhere in the documentation either. The FileChannel did not even conform to that single source/sink replay order until
FLUME-1432. In fact, conforming to that order even in FLUME-1432 was a side-effect of fixing a race condition, and not specifically because it was meant to be handled. At some point, if it is decided this can change again to some other order (maybe a thread based ordering, or or an order in which events in a transaction will all get written out together on commit, rather than getting written out on put and fsynced on commit), then if this channels' tests break, the onus will be on the contributor who submitted the file channel change to fix it - which I do not agree with.
In summary, I am ok with depending on other channels. What I am not ok with is depending on the behavior of those channels, which are not explicitly guaranteed through interfaces (or even documentation).
3) WRT reserving capacity on both channels. If you mean that each txn should not reserve capacity on both channels. I agree. And the current implementation does not do that. Or were you by any chance referring to the issue of upfront reservation (at put() time) versus commit() time ?
I am talking about put v/s commit time. In most cases, transaction capacity is often configured to be much higher than the the max expected in most cases. I would suggest doing a full implementation where there is a transaction outside, and a backing store inside. Once the transaction is about to get committed, then decide where the events go. (It is going to be tricky to do this and avoid doing all the writes at once - the File Channel fsyncs on commit, but writes to OS buffers on every write - so it is possible some data is flushed to disk before explicit fsyncs). This is not a blocker anyway, we can work on it later as well.
4) WRT to testing with fsyncs removed, i have not pursued it since i felt that would be compromising the durability guarantees. Do you think its useful to do that ?
I was wondering whether simply adding a config param to change the fsyncs (fsync all files before checkpoint in parallel or something) to optional will give comparable performance to what is being proposed in this jira. I have a feeling it might, since fsyncs are the most expensive part of the file channel, and removing the fsyncs just writes to the in-memory OS buffer and the fsyncs will be taken care of in the background.
5) WRT "we should make the configuration change". Can you elaborate ? I am not certain which change specifically you are referring to. Or are you referring to the whole config approach ?
6) WRT lifecycle management and dependencies : After configuration, any channel that is found to be not connected with a source/sink is automatically discarded from the list of Life cycle system managed components. Consequently the Spillable Channel becomes the sole life cycle manager of the overflow channel. Otherwise, yes there would be havoc.
I just think we should not allow one component to pull a reference to another component in the system. This explicitly breaks the "interact via interfaces" idea. We could make sure the spillable channel own both the channels (and manages the lifecycle of these) - to avoid components which end up being able to access other components owned by the lifecycle manager.
Hope I made myself clearer this time!