Currently, the constructor of RocksDBKeyedStateBackend has the following shortcomings:
- It does creation and cleanup of some directories and files. This makes it harder to unit-test because dependencies are created in the constructor and not passed in from outside.
- It leaves many important fields uninitialized and more methods e.g. restore have to be called before the backend object is fully constructed. This is error-prone in many ways and hard to unit-test. I think the origin of this problem was introducing incremental snapshots, because in this case, we can only open a RocksDB instance AFTER the restore code was executed and restored the working directory.
As a solution, I would suggest to have a dedicated builder class that takes the current constructor parameters and (optional) the state handles to restore. Then, this class constructs and intializes all required objects, and those objects are only passed to the new RocksDBKeyedStateBackend constructor that does no other work besided assigning dependencies to fields.
With this change, I would also extract the different restore strategies for incremental and full snapshots out of the backend's main class, into their own classes. They will then be used in the newly introduced builder from the previous step. This builder would receive all objects that currently go into the constructor and the restore method. It should create all directories, and (if applicable) download state, create and restore a RocksDB instance object, create and register states. Everything concerning the construction of collaboratores for the backend should go into the builder and the backend main class should can simply receive all collaboratores and assign them to final fields.
One detail to concider for the builder is that all resources for collaboratores should be created and initialized in a resource-acquisition-is-initialization (RAII) style, in particular because some of them are backed by native (JNI) objects: If we fail to create a resource during the process, all previously created resources should properly be released and de-allocated. Releasing/De-allocation should happen in the excact inverse order of creation, to avoid any transitive double-frees in the native code. Only when all resources are created, the builder will create the main backend object, so again that the constuctor does not have to deal with any fault handling or cleanup-logic.