Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1508

ResourceTracker does not update workers' resource capacities after the first join

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Component/s: Resource Manager, Worker
    • Labels:
      None

      Description

      Background

      Resource Tracker (RT) registers a worker with its connection info and resource capacity into RMContext when the worker sends the first heartbeat to RT.

      Problem

      • All heartbeats that each worker sends to RT always include the resource capacity. It is not necessary and inefficient in terms of network costs.
      • After the first heartbeat, RT does not update the resource capacity of the worker even though the resource capacity of the same worker is changed. This case is unusual, but it is possible. For example, some workers can rejoin to the Tajo cluster with modified resource capacities. But, RT will still keep previous resource capacity.

      Solution

      • We should change resource capacity of heartbeat to optional fields.
      • Only the first heartbeat of each worker includes the resource capacity of the worker.
      • When a resource capacity field of heartbeat is set, RT updates the resource capacity for the worker.

      This solution will reduce the size of heartbeat message and will enable RT to update changed resource capacity of workers.

        Issue Links

          Activity

          Hide
          jhkim Jinho Kim added a comment -

          This issue was fixed by TAJO-1599. Please close this
          The ResourceTracker can update resource capacities as following:

          • First join or rejoin(restart)
          • Startup the master node after startup the worker node.
          Show
          jhkim Jinho Kim added a comment - This issue was fixed by TAJO-1599 . Please close this The ResourceTracker can update resource capacities as following: First join or rejoin(restart) Startup the master node after startup the worker node.

            People

            • Assignee:
              Unassigned
              Reporter:
              hyunsik Hyunsik Choi
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development