Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.9.0
-
None
-
None
Description
Restoring state every time a SamzaContainer is restarted (due to failure, or re-deploy) can be expensive. Samza currently always restores state when a SamzaContainer starts. This could be avoided if the container is started on the same machine that it was running on before shutdown. It could re-use state that exists locally when it's restarted. There are two modes of state re-use:
- A clean shutdown of the container has occurred.
- An unclean shutdown of the container occurred.
Re-using clean state (1) could be achieved by having the SamzaContainer write an OFFSET file for every local state directory when the SamzaContainer is shutdown (after the state stores have been stopped). A clean directory would look as follows:
$PWD/state/my-kv-store/Partition-0/OFFSET
The offset file must contain the offset of the last method in the changelog feed. This information is retrievable via the SystemAdmin.getSystemStreamMetadata method. When a SamzaContainer starts up, it can check if the OFFSET file exists for each store. If the OFFSET file does exist, the SamzaContainer can:
- Instruct the state store to open the on-disk DB, rather than creating a new state store from scratch.
- Read the OFFSET file.
- Delete the OFFSET file.
- Restore the state store from the OFFSET value, rather than from the oldest offset in the changelog (what Samza currently does).
The OFFSET file must be removed (3) before any restoration is executed on the store. If this is not done, then a partial restoration might occur, followed by a failure. In such a case, non-idempotent writes to a store could result in inaccurate data being persisted to disk. The trade-off with deleting the OFFSET optimistically is that a failure during restoration will result in the whole state having to be restored, since the OFFSET file is gone. This is tolerable, since a failure during restoration is equivalent to an unclean shutdown, in which case you wouldn't expect the (possibly corrupted) local state to be used anyway.