Product Enhancement Request
In Spark/HIVE/etc. I often need to complete work for which an entire core is overkill such as managing a JDBC connection or doing a simple map/transform; however, when I do this on large datasets, 1 core X 500 partitions/mappers winds up with quite the cluster level footprint even though most of those processor cycles are idle.
I propose that we enable YARN to allow a user to submit jobs that "allocate < 1 core". Under the covers, the JVM will still receive one core but YARN/ZK could keep track of the fractions of cores being used and allow other jobs to consume the same core twice provided that both jobs were submitted with <= .5 cores. Now, YARN can more effectively utilize multi-threading and decrease CPU idle for the power users.
Obviously this can ultimately result in very bad outcomes, but if we also enable security controls then customers can configure such that only admins/gates can submit with < 1 full core and ultimately resulting in a cluster that can do more.