Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-516 Support standalone Samza jobs
  3. SAMZA-557

Reuse local state in SamzaContainer on clean shutdown

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.9.0
    • None
    • container
    • None

    Description

      Restoring state every time a SamzaContainer is restarted (due to failure, or re-deploy) can be expensive. Samza currently always restores state when a SamzaContainer starts. This could be avoided if the container is started on the same machine that it was running on before shutdown. It could re-use state that exists locally when it's restarted. There are two modes of state re-use:

      1. A clean shutdown of the container has occurred.
      2. An unclean shutdown of the container occurred.

      Re-using clean state (1) could be achieved by having the SamzaContainer write an OFFSET file for every local state directory when the SamzaContainer is shutdown (after the state stores have been stopped). A clean directory would look as follows:

      $PWD/state/my-kv-store/Partition-0/OFFSET

      The offset file must contain the offset of the last method in the changelog feed. This information is retrievable via the SystemAdmin.getSystemStreamMetadata method. When a SamzaContainer starts up, it can check if the OFFSET file exists for each store. If the OFFSET file does exist, the SamzaContainer can:

      1. Instruct the state store to open the on-disk DB, rather than creating a new state store from scratch.
      2. Read the OFFSET file.
      3. Delete the OFFSET file.
      4. Restore the state store from the OFFSET value, rather than from the oldest offset in the changelog (what Samza currently does).

      The OFFSET file must be removed (3) before any restoration is executed on the store. If this is not done, then a partial restoration might occur, followed by a failure. In such a case, non-idempotent writes to a store could result in inaccurate data being persisted to disk. The trade-off with deleting the OFFSET optimistically is that a failure during restoration will result in the whole state having to be restored, since the OFFSET file is gone. This is tolerable, since a failure during restoration is equivalent to an unclean shutdown, in which case you wouldn't expect the (possibly corrupted) local state to be used anyway.

      Attachments

        1. SAMZA-557-2.patch
          25 kB
          Navina Ramesh
        2. SAMZA-557-1.patch
          24 kB
          Navina Ramesh
        3. SAMZA-557-0.patch
          11 kB
          Navina Ramesh

        Issue Links

          Activity

            People

              navina Navina Ramesh
              criccomini Chris Riccomini
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: