As part of the E2Data research project, we at the Institute of Communication and Computer Systems (ICCS) of the National Technical University of Athens, Greece, have been working on a modified version of Hadoop Yarn where the GPU devices that are available in the underlying cluster are discovered via a Java wrapper of the OpenCL framework API (namely JOCL), instead of vendor-specific binaries.
In other words, we have shifted towards a more uniform and high-level handling of GPUs as "OpenCL-enabled" devices. This way, we manage to decouple GPU discovery/management from vendor-specific technicalities; every GPU, no matter the vendor, is the same for E2Data YARN (more specifically, for the NodeManager component), provided that the OpenCL runtime and drivers for the GPU(s) of interest are installed on the respective node(s) of the cluster.
This way, we managed to use GPUs other than NVIDIA (which are the only ones officially supported via the nvidia-smi binary) with minimal additional effort, after our initial changes.
Ultimately, our goal is to unify every processing unit that YARN can possible utilize (CPU cores, GPUs, FPGAs) behind a common, simple, high-level interface; that of the OpenCL-enabled device.
The only drawback of our approach is that vendor-specific info regarding the GPUs is lost (e.g. temperature). We believe, however, that the lost information is not necessary for YARN; everything that Hadoop needs in order to discover and handle GPU devices is provided by OpenCL.
This is just a proposition/a prompt for discussion for the time being. This modified version is a work in progress. We consider community feedback regarding the core concept (and the fact that it may constitute a paradigm shift for YARN) crucial before attaching any patch file and diving into more (technical) details.