Uploaded image for project: 'Samza'
  1. Samza
  2. SAMZA-656

Script to periodically clean-up store directories that are persisted outside YARN

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      In order to enable local state re-use when Samza job is executing in YARN or standalone mode, the data store is persisted outside YARN's working directory.
      If the data store used by an application is persisted on disk beyond the application's lifetime, there has to be a periodic check to delete unused/orphaned stores. This ensures that the Node Managers (NM) don't run out of disk space.

      There are couple of ways to solve this:

      1. YARN supports adding auxiliary services that run on the NMs. The generic auxiliary service is event-based and receives notification on application/container start and stop. More than one auxiliary services can be defined on an NM. We can use the callbacks on application and/or container start and/or stop to perform a check-and-purge function on all data stores. We can identify unused data stores by checking for existence of file handlers that are not pid=1.
      >Pros:
      > - Becomes part of existing samza-yarn module
      >
      >Cons:
      > - It is not a generic solution. If the developer uses other resource managers like Mesos, then there is no good support for purging the data.

      2. Run a daemon on the NM that periodically cleans up all store directories.
      >Pros:
      > - Independent of the underlying resource manager
      >
      >Cons:
      > - This becomes a pre-requisite for anyone who runs a YARN grid and wants to run an Samza job.

      There is a proposal for YARN to support post-application clean-up in YARN-2261YARN-2261. This is a WIP and it seems to be applicable only for Capacity Scheduler.

      We can use the YARN NM's auxiliary service to trigger a clean-up task , as suggested in [1]. If needed in future, we can move the clean-up logic to script and make it accessible to other developers who want to run it as a daemon on NM nodes.

      Attachments

        1. DESIGN-SAMZA-656.pdf
          65 kB
          Shanthoosh Venkataraman
        2. GCstalelocalstate.pdf
          60 kB
          Shanthoosh Venkataraman

        Issue Links

          Activity

            People

              spvenkat Shanthoosh Venkataraman
              navina Navina Ramesh
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: