Define Hadoop YARN term "vcore"



      This is a request to define the Hadoop YARN term "vCore".  It's clearly different than vCPU as in the number of virtual CPUs (or CPU cores) a system has as per /proc/cpuinfo. What is a YARN vcore, please?

      Background: I am running Hadoop YARN on 24 AWS EC2 instances from the R5 family (memory-intensive) with the instance size of 24 XLarge (96 vCPUs and 768 GB RAM each), plus the cluster master.

      I've launched a Spark application with the following spark-submit parameters:

          --executor-memory 224G
          --conf spark.executor.memoryOverhead=23901M
          --executor-cores 32

      That sets a ratio of about 250 GB of RAM (combined) to 32 vCPUs per executor; I have Spark dynamic resource allocation enabled, so I expect to see three executors per instance, and that's how it turns out.

      24 nodes x 3 executors per node = 72 executors

      Plus the Application Master running on the Master node makes 73 executors.

      This matches the "73 allocated" I see in "yarn top" output in the "Containers" line:

          YARN top - 11:03:57, up 0d, 18:9, 1 active users, queue(s): root
          NodeManager(s): 24 total, 24 active, 0 unhealthy, 44 decommissioned, 0 lost, 0 rebooted
          Queue(s) Applications: 1 running, 1 submitted, 0 pending, 0 completed, 0 killed, 0 failed
          Queue(s) Mem(GB): 183 available, 17809 allocated, 69008 pending, 247 reserved
          Queue(s) VCores: 2230 available, 73 allocated, 279 pending, 1 reserved
          Queue(s) Containers: 73 allocated, 279 pending, 1 reserved

      Most of the memory is allocated, which is as expected.

      But why does the "Queue(s) VCores" line say "73 allocated"?

      Looks like 1 VCore = 32 vCPUs?

      I looked in /etc/hadoop/conf/yarn-site.xml on one of the 24XL task
      instances with 96 vCPUs to double check how many virtual CPUs YARN thinks
      the node has, and it is 96 as expected:


      I looked through all the Hadoop YARN documentation linked from https://hadoop.apache.org/docs/stable/index.html looking for a definition of a Hadoop YARN vCore and I couldn't find one.

      https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html uses "virtual cores" and "computation based resource" when talking about vCores.

      What is a Hadoop YARN vCore?  How does it relate to virtual CPUs I see in e.g., /proc/cpuinfo on Linux?

      There are many mentions of "vcore" in Hadoop YARN documentation; could we please add a definition of this term?





