Does CompressionContext class have to be public? Can it stay pkg private? You'll have to move your new class into wal package but that seems fine to me.
My sense though is that this is a generally useful WAL reader? One that can do compressed or non-compressed WAL? One that can be used by replication but also by fellas who want to index hbase, etc.
Can it be in the wal package? Then don't have to open up so much of HLog?
You're right that it can be a generally useful WAL reader, for users that need to be able to seek directly into newly open files without having to scan everything that comes before if compression is on. Right now its API is tailored to replication's need, we could make it more general but, unless we have another use case for it right now, I don't see the point.
So I'll move it to wal and rename the class/methods a bit.
Does the base Reader interface have to know about a compression context? Can this not be internal to the implementation?
Until HDFS lets us tail a file under construction we need to pass the dict back when opening the file.
Yeah, why can't an implementation of HLog.Reader manage the compression context internally? Why it have to be out here in this ReplicationHLogReader class? Afterall, isn't the dictionary reconstructed on read? You don't save it around?
It would be fine if we didn't have to:
- seek into a file we never read before (after a region server died and we pick up the queue)
- reopen files in order to tail them (when normally replicating)
We could augment HLog.Reader to support reopening of files, basically push down the stuff is doing ReplicationHLogReader even more. That way we could hide all the dirty details? I haven't thought about modifying that before.
Missing a license
Its unfortunate that you can't tell its a compressed wal from reading say some magic or metadata at the head of the file. It seems a bit broke consulting configuration.
Good point, it would simplify a lot of things. HLog compression was implemented at the HLog.Entry level though so technically it's not even the WAL itself that's compressed. My next comment shows what that means.
Is this right:
Yes, if you keep the compression context in there it'll replicate the HLog.Entry compressed with a dictionary that the slave has no knowledge of. I had this comment in my first patch and I think I forgot to move it over:
// Setting it to null prevents from sending compressed edits that the sink wouldn't parse