Type: New Feature
Affects Version/s: None
Fix Version/s: None
Currently, NiFi Registry does not offer High Availability (HA) out of the box. One has to configure an environment around one or more NiFi Registry instances to achieve the required level of recoverability and availability.
This is not a requirement in many deployment scenarios as NiFi Registry is on the critical path of most system architectures. That is, it is a place to save and retrieve versions of flows and extensions, but if NiFi Registry is temporarily offline, NiFi data flows deployed to NiFi and MiNiFi instances continue to function just fine.
However, a bigger concern is data availability and backup; that is, the guarantee that data persisted to NiFi Registry is not lost due to an instance failure. Eventually, it will be nice to offer a NiFi Registry HA solution that allows for replicated data or external persistence providers (that themselves can be HA).
In the meantime, folks are looking for the best way to build their own data backup and recovery solutions for NiFi Registry. A lot of possible solutions and recommendations for backup and recovery or cold-slave failover require copying the data in the NiFi Registry's home directory host storage to another location, where it could be used to create another NiFi Registry with the same data on demand, e.g., in a cloud migration or disaster recovery scenario.
If the NiFi Registry service is running when this copy operation is performed, one risks copying partially-written data/records/files that could be corrupted when later loaded/read from disk. One solution for this today is to stop the NiFi Registry, but this leaves it unavailable for users and scripts, which is not ideal. For example, continuous deployment scripts for NiFi data flows that read flows from NiFi registry would not be able to access a required service.
In the long-term, it would be nice to offer proper HA NiFi Registry solution out of the box. However, in the short-term, in order to avoid having to shutdown NiFi Registry in order to initiate a backup, it would be nice for admins to be able to put a NiFi Registry instance into "read only maintenance mode", during which the contents of the NiFi Registry home directory could be more safely copied to a backup location or cold spare. (I say "more safely" because some files in the home directory, such as the default location for logs, would continue to be written too, but the most important files, such as those used by the file-based database and persistence providers, would stabilize after existing write operations are flushed to disk.)
- endpoints for turning maintenance mode on/off would fit in nicely as custom endpoints under Actuator (
NIFIREG-134), and therefore could be access controlled but Actuator authorization rules
- when maintenance mode is enabled, a custom Spring filter could intercept any requests that modify persisted state (eg, by resource path and HTTP method pattern matching) return a "503 Service Unavailable" status code indicating that the resource is temporarily unavailable. A spring filter checking HTTP methods against resources is an approach already used to authorize access to certain resources, so there might be an opportunity for code-reuse there (the maintenance mode filter would need to be dynamically, programmatically enabled/disabled, and instead of returning a 403, we would return a 503)
- when maintenance mode is enabled, the /actuator/health endpoint could also indicate this, giving clients a way to check if a server is in maintenance mode or not.