Copying over from the internal email thread (sorry for that)
I like option (2). One thing to consider is the submodule structure in this approach. If we have KeyValueStorageEngineFactory in samza-kv, but we then have samza-kv-leveldb and samza-kv-rocksdb components, then samza-kv must depend on both (i.e. Pull in dependencies from both projects--even if you only use RocksDB, you'd still have LevelDB junk in your classpath). This is a little ugly from a hygiene perspective, but I don't think it should cause any problems. Alternatively, we could just have one samza-kv submodule, and put all implementations in there, but that seems a bit nastier even than separate submodules. Alternatively^2, we could have samza-kv do a Class.forName().newInstance to create the actual StorageEngine, but this seems likely to introduce even more runtime errors due to improper dependencies.
Other than that, I don't see an immediate problem with approach (2). It seems preferable to approach (1) in your list below. That said, let's move over to JIRA. I'm sure other folks will have feedback as well.
Currently, it seems the Samza job config has something like this:
Which today -> defaults to a LevelDbKeyValueStore. To make this more pluggable, I think we can use 2 approaches:
i) Separate Factory:
Have something like:
ii) Additional factory config:
In this case, we can keep the same factory: org.apache.samza.storage.kv.KeyValueStorageEngineFactory,
but have additional parameters to determine the type. Example:
stores.*.factory.persistent=true / false
To add to that, I think option (ii) might be better since we can abstract all key-value stores (RocksDB / LevelDB / In-memory / blah) with one factory and use the additional config parameter to determine what type ?
This way, the different storage engines can be categorized by their types in a hierarchy ( KeyValue / BitMap / Document structured / blah ...)
Personally, I'm biased towards option (ii) since the existing jobs don't need to change their configs (and we default to LevelDB). What do you guys think ?