On the lines of s4PigWrapper, write a s4 application master to host s4 piper inside Hadoop Yarn. This could be useful not only for reading data stored in hadoop ( to build or train a model)... But we could make use of the resource manager to deploy s4 instances in remote machine and monitor them. In short, we could make use of most of the resource management , scheduling and other good stuff in Yarn.
- Yarn is useful to deploy and launch s4 instances.
- It still requires deploying node managers on each box which means it will
be useful if one is running more than one s4 process on a node.