1. A half-baked thought: have you considered an alternative strategy where, on a clean shutdown, the SamzaContainer could move its state from the appcache directory to some other directory (e.g. /mnt/u001/stores)
Yeah. That is an alternative strategy which can be achieved using the NM Aux Services or including the . I don't see any obvious advantage. It can be advantageous if we make store retention optional for the application. We can only move those stores out to /mnt/u001/stores and leave the others in YARN to be garbage collected.
Can you elaborate a bit on the difficulty you see with standalone?
I think my understanding of the standalone design and how the stores are partitioned on the disk was incorrect. I will re-word this. It shouldn't be very different with the standalone use-case.
3. "Number of containers and/or container-partition assignment changes across successive application runs."
Well, this is what I had in mind. In the example:
first attempt of the application -> container 1 (task1, task2) on host1 and container 2 (task3) on host2
second attempt -> container 1 (task1) on host1, container 2 (task2) on host2 and container 3 (task3) on host3
Isn't this case possible? If so, not all partitions will be available on the respective hosts.
Wonder if $STATE_ROOT_DIR should be a config rather than an environment variable. If it's an environment variable, how does it get set?
It is config and not environment variable. I have mentioned this Page 3 of the design under "Relocating store directory" section
6. Are the FairScheduler configs set in yarn-site.xml, or in some type of scheduler-site.xml file?
Well, I think if it is in yarn-site.xml, it will work. There is clearly not enough documentation on the yarn website about this. So, i will have to experiment with it.
7.Rather than including the container-host mapping in the Config object, I think it should be persisted as part of the ContainerModel. This is still served by the JobCoordinator, but is outside of the Config object.
Ok. That makes sense.
8. Nit: Relocating directory store link is just SAMZA-, with no number.
Yeah. Waiting to open a jira for that task and then, fill it in. It does make the document look incomplete though
Using /tmp is a bit dangerous because some systems have limits on the size of a `/tmp` directory. If not, maybe we could use java.io.tmpdir as the default.
10. I think we shouldn't worry too much about per-application retention policies.