Tajo
  1. Tajo
  2. TAJO-317

Improve TajoResourceManager to support more elaborate resource management

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: resource manager
    • Labels:
      None

      Description

      Status of the current Tajo Resource Manager (RM)

      • Tajo RM manages CPU, DISK resource incompletely, and it only provides resource management through memory allocations.
      • In addition, Tajo RM considers the memory resource as the fixed number of slots.

      Problem

      In many cases, workloads can be categorized into I/O intensive job and CPU and memory consuming job. For example, scan and hash partition or INSERT OVERWRITE may be belong to I/O intensive job. In general, Aggregation can be belong to CPU-memory consuming job. The current RM is not fit to support selectively I/O intensive job or CPU-memory consuming job because it provides only memory slots. We need more elaborate resource management mechanism.

      In addition, in most resource management systems, the remain resource less than required resource is not allocated in response to a resource request. It is not good to fully utilize the cluster resources. In order to mitigate this problem, we need to add resilience to allocation mechanism. For example, min-max request would be useful for it.

      Proposal

      • Tajo RM should provides resource management for disk and cpu-memory.
        • Tajo RM should provide allocation request call with min, max memory request, and min, max disk request.
          • min-max request will be useful to fully utilize remain cluster resources.
      • Each resource request should have a priority. The priority can be disk or memory.
        • If the priority is disk
          • disk allocation will be limited depending on the remain disk resource
          • memory allocation will be not limited regardless of the remain memory resource, and just reduce the remain memory resource.
        • If the priority is memory
          • memory allocation will be limited depending on the remain memory resource
          • disk allocation will be not limited regardless of the remain disk resource, and just reduce the remain disk resource.
      • disk resource in each worker is represented as a float value.
        • The initial disk resource will be the number of disks which participate in HDFS data directory.
      1. TAJO-317.patch
        52 kB
        Keuntae Park
      2. TAJO-317_2.patch
        98 kB
        Keuntae Park
      3. TAJO-317_3.patch
        98 kB
        Keuntae Park
      4. TAJO-317_4.patch
        97 kB
        Keuntae Park
      5. TAJO-317_5.patch
        98 kB
        Keuntae Park
      6. TAJO-317.doc.patch
        3 kB
        Keuntae Park
      7. TAJO-317.doc_2.patch
        3 kB
        Jihoon Son
      8. TAJO-317.doc_3.patch
        3 kB
        Keuntae Park
      9. TAJO-317.doc_4.patch
        6 kB
        Hyunsik Choi
      10. TAJO-317.doc_5.patch
        6 kB
        Keuntae Park

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          12d 5h 45m 1 Jihoon Son 02/Dec/13 08:16
          Hide
          Keuntae Park added a comment -

          No problem,
          I always thank you for your kind review

          Show
          Keuntae Park added a comment - No problem, I always thank you for your kind review
          Hide
          Jihoon Son added a comment -

          Sorry, Keuntae.
          My mistake.
          Thanks.

          Show
          Jihoon Son added a comment - Sorry, Keuntae. My mistake. Thanks.
          Hide
          Keuntae Park added a comment -

          Jihoon Son If you mean doc patch, it seems that it is already committed to master.

          Show
          Keuntae Park added a comment - Jihoon Son If you mean doc patch, it seems that it is already committed to master.
          Hide
          Jihoon Son added a comment -

          Sorry for the late review.
          The latest patch requires the rebasing.
          Keuntae, would you upload the patch again after rebasing, please?

          Show
          Jihoon Son added a comment - Sorry for the late review. The latest patch requires the rebasing. Keuntae, would you upload the patch again after rebasing, please?
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Tajo-trunk-postcommit #594 (See https://builds.apache.org/job/Tajo-trunk-postcommit/594/)
          Update documentation by TAJO-317. (Keuntae Park, jihoon, and hyunsik) (hyunsik: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=7e47f6b017717fcc8917b6881fc90c4ade37a8b2)

          • tajo-project/src/site/markdown/tajo-0.8.0-doc.md
          Show
          Hudson added a comment - FAILURE: Integrated in Tajo-trunk-postcommit #594 (See https://builds.apache.org/job/Tajo-trunk-postcommit/594/ ) Update documentation by TAJO-317 . (Keuntae Park, jihoon, and hyunsik) (hyunsik: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=7e47f6b017717fcc8917b6881fc90c4ade37a8b2 ) tajo-project/src/site/markdown/tajo-0.8.0-doc.md
          Keuntae Park made changes -
          Attachment TAJO-317.doc_5.patch [ 12617542 ]
          Hide
          Keuntae Park added a comment -

          Hyunsik Choi great documentation !!
          I've just modified slightly and uploaded it.
          Please review again.

          Show
          Keuntae Park added a comment - Hyunsik Choi great documentation !! I've just modified slightly and uploaded it. Please review again.
          Hyunsik Choi made changes -
          Attachment TAJO-317.doc_4.patch [ 12617362 ]
          Hide
          Hyunsik Choi added a comment -

          Basing the latest patch, I've updated the documentation. Please review this. You can see the draft at http://people.apache.org/~hyunsik/tajo/tajo-0.8.0-doc.html#ResourceConfiguration.

          Show
          Hyunsik Choi added a comment - Basing the latest patch, I've updated the documentation. Please review this. You can see the draft at http://people.apache.org/~hyunsik/tajo/tajo-0.8.0-doc.html#ResourceConfiguration .
          Keuntae Park made changes -
          Attachment TAJO-317.doc_3.patch [ 12617316 ]
          Hide
          Keuntae Park added a comment -

          Thank you for the review, Hyunsik.
          I added more realistic configuration case in the document.
          Please review again.

          Show
          Keuntae Park added a comment - Thank you for the review, Hyunsik. I added more realistic configuration case in the document. Please review again.
          Hide
          Hyunsik Choi added a comment -

          Could you please add tajo.worker.resource.cpu-cores and disks? They look omitted. Thanks.

          Show
          Hyunsik Choi added a comment - Could you please add tajo.worker.resource.cpu-cores and disks? They look omitted. Thanks.
          Hide
          Hyunsik Choi added a comment -

          I'm reviewing this patch.

          Show
          Hyunsik Choi added a comment - I'm reviewing this patch.
          Hide
          Keuntae Park added a comment -

          You are right, Jihoon.
          I'm confused between default configuration and my working configuration
          I agree with your fixed documentation, which is correct and much easier to understand.
          Thank you for the review !!

          Show
          Keuntae Park added a comment - You are right, Jihoon. I'm confused between default configuration and my working configuration I agree with your fixed documentation, which is correct and much easier to understand. Thank you for the review !!
          Jihoon Son made changes -
          Attachment TAJO-317.doc_2.patch [ 12617143 ]
          Hide
          Jihoon Son added a comment -

          Keuntae, thanks for your document.
          I made some changes to your document for the more clear explanation and fixing some mistakes.
          Please check my patch whether it represents changes from this issue well.

          Show
          Jihoon Son added a comment - Keuntae, thanks for your document. I made some changes to your document for the more clear explanation and fixing some mistakes. Please check my patch whether it represents changes from this issue well.
          Keuntae Park made changes -
          Attachment TAJO-317.doc.patch [ 12616728 ]
          Hide
          Keuntae Park added a comment -

          I've uploaded the patch that updates the documentation.
          Please, review it.

          Show
          Keuntae Park added a comment - I've uploaded the patch that updates the documentation. Please, review it.
          Jihoon Son made changes -
          Link This issue relates to TAJO-371 [ TAJO-371 ]
          Hide
          Keuntae Park added a comment -

          OK, I'll try to update the doc

          Show
          Keuntae Park added a comment - OK, I'll try to update the doc
          Hide
          Hyunsik Choi added a comment - - edited

          Could you please update the documentation by this patch?
          http://tajo.incubator.apache.org/tajo-0.8.0-doc.html

          Please just update the markdown file (tajo-project/src/site/markdown/tajo-0.8.0-doc.md) in order to reflect the up-to-date contents to tajo-0.8.0-doc.html file.

          Show
          Hyunsik Choi added a comment - - edited Could you please update the documentation by this patch? http://tajo.incubator.apache.org/tajo-0.8.0-doc.html Please just update the markdown file (tajo-project/src/site/markdown/tajo-0.8.0-doc.md) in order to reflect the up-to-date contents to tajo-0.8.0-doc.html file.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Tajo-trunk-postcommit #581 (See https://builds.apache.org/job/Tajo-trunk-postcommit/581/)
          TAJO-317: Improve TajoResourceManager to support more elaborate resource management. (Keuntae Park via jihoon) (jihoonson: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=528c914f9a133bef79df07017cfa424c9fab4412)

          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/rm/TajoWorkerResourceManager.java
          • CHANGES.txt
          • tajo-core/tajo-core-backend/src/main/proto/TajoMasterProtocol.proto
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/ApplicationIdUtils.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/ResourceAllocator.java
          • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/TajoTestingCluster.java
          • tajo-core/tajo-core-backend/src/main/resources/webapps/admin/cluster.jsp
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/rm/WorkerResource.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/YarnResourceAllocator.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/TajoResourceAllocator.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/TajoMasterService.java
          • tajo-core/tajo-core-backend/src/main/resources/webapps/admin/index.jsp
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/TajoContainerProxy.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/rm/WorkerResourceManager.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/TajoWorker.java
          • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestTajoResourceManager.java
          • tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/rm/TajoWorkerContainerId.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/Task.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/rm/YarnTajoResourceManager.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Tajo-trunk-postcommit #581 (See https://builds.apache.org/job/Tajo-trunk-postcommit/581/ ) TAJO-317 : Improve TajoResourceManager to support more elaborate resource management. (Keuntae Park via jihoon) (jihoonson: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=528c914f9a133bef79df07017cfa424c9fab4412 ) tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/rm/TajoWorkerResourceManager.java CHANGES.txt tajo-core/tajo-core-backend/src/main/proto/TajoMasterProtocol.proto tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/util/ApplicationIdUtils.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/ResourceAllocator.java tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/TajoTestingCluster.java tajo-core/tajo-core-backend/src/main/resources/webapps/admin/cluster.jsp tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/rm/WorkerResource.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/YarnResourceAllocator.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/TajoResourceAllocator.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/TajoMasterService.java tajo-core/tajo-core-backend/src/main/resources/webapps/admin/index.jsp tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/TajoContainerProxy.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/rm/WorkerResourceManager.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/TajoWorker.java tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestTajoResourceManager.java tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/rm/TajoWorkerContainerId.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/Task.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/rm/YarnTajoResourceManager.java
          Hide
          Keuntae Park added a comment -

          Thank you so much for the kind review, Jihoon and Hyunsik !!

          Show
          Keuntae Park added a comment - Thank you so much for the kind review, Jihoon and Hyunsik !!
          Jihoon Son made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Jihoon Son added a comment -

          I committed the latest patch.
          Thanks, Keuntae.

          Show
          Jihoon Son added a comment - I committed the latest patch. Thanks, Keuntae.
          Hide
          Hyunsik Choi added a comment -

          +1

          I verified 'mvn clean install'. It's really a great job. Ship it!

          Show
          Hyunsik Choi added a comment - +1 I verified 'mvn clean install'. It's really a great job. Ship it!
          Keuntae Park made changes -
          Attachment TAJO-317_5.patch [ 12616508 ]
          Hide
          Keuntae Park added a comment -

          Thank you for the review, Jihoon.

          I've uploaded new patch where unnecessary comments, LOG.debug, and System.out.println() are removed (from WorkerResource.java, TajoResourceManager.java)

          Show
          Keuntae Park added a comment - Thank you for the review, Jihoon. I've uploaded new patch where unnecessary comments, LOG.debug, and System.out.println() are removed (from WorkerResource.java, TajoResourceManager.java)
          Hide
          Jihoon Son added a comment -

          Before committing the patch, I found that there are still commented out codes and some print codes for the debug.
          Would you check that the latest patch is the proper patch, Keuntae?
          If it is, please remove those codes.

          Show
          Jihoon Son added a comment - Before committing the patch, I found that there are still commented out codes and some print codes for the debug. Would you check that the latest patch is the proper patch, Keuntae? If it is, please remove those codes.
          Hide
          Jihoon Son added a comment -

          +1 for the latest patch.
          If there aren't any other comments, I'll commit this patch.

          Show
          Jihoon Son added a comment - +1 for the latest patch. If there aren't any other comments, I'll commit this patch.
          Keuntae Park made changes -
          Attachment TAJO-317_4.patch [ 12616500 ]
          Hide
          Keuntae Park added a comment - - edited

          Sorry for the late answer, Jihoon and Hyunsik.

          I've uploaded new patch that reflects all your comments:

          • when dedicated=false, use default values (memory size is 512M, number of disk is 1.0)
          • use systemConf.getIntVar() instead of systemConf.getInt()

          I really thank you guys for the review,
          and please let me know what I miss yet.

          Show
          Keuntae Park added a comment - - edited Sorry for the late answer, Jihoon and Hyunsik. I've uploaded new patch that reflects all your comments: when dedicated=false, use default values (memory size is 512M, number of disk is 1.0) use systemConf.getIntVar() instead of systemConf.getInt() I really thank you guys for the review, and please let me know what I miss yet.
          Hide
          Hyunsik Choi added a comment -

          Thank you Keuntae for your contribution. Thank you Jihoon for your detail review.

          In overall, the patch looks great for me. Almost all parts reflect the proposal well. However, I also agree with Jihoon's comment. In that part, it would be better to use systemConf.getIntVar() instead of systemConf.getInt with device resource.

          Show
          Hyunsik Choi added a comment - Thank you Keuntae for your contribution. Thank you Jihoon for your detail review. In overall, the patch looks great for me. Almost all parts reflect the proposal well. However, I also agree with Jihoon's comment. In that part, it would be better to use systemConf.getIntVar() instead of systemConf.getInt with device resource .
          Hide
          Jihoon Son added a comment -

          I mean that it will be better that resource informations included in WorkerHeartbeat are changed as in the previous configuration. (Line 417~433, TajoWorker.java)
          In your patch, when the dedication flag is not set, the default values of memory and disk looks the total memory size and the total number of disks of a worker, respectively.

          Show
          Jihoon Son added a comment - I mean that it will be better that resource informations included in WorkerHeartbeat are changed as in the previous configuration. (Line 417~433, TajoWorker.java) In your patch, when the dedication flag is not set, the default values of memory and disk looks the total memory size and the total number of disks of a worker, respectively.
          Hide
          Keuntae Park added a comment -

          Thank you for the review, Jihoon.

          You can see the following code in run() method of TajoResourceAllocator

          int requiredMemoryMB = tajoConf.getIntVar(TajoConf.ConfVars.TASK_DEFAULT_MEMORY);
          float requiredDiskSlots = tajoConf.getFloatVar(TajoConf.ConfVars.TASK_DEFAULT_DISK);
          

          Becuase TASK_DEFAULT_MEMORY=512MB and TASK_DEFAULT_DISK=1.0,
          we still have the same default values as those of the previous configuration.

          Show
          Keuntae Park added a comment - Thank you for the review, Jihoon. You can see the following code in run() method of TajoResourceAllocator int requiredMemoryMB = tajoConf.getIntVar(TajoConf.ConfVars.TASK_DEFAULT_MEMORY); float requiredDiskSlots = tajoConf.getFloatVar(TajoConf.ConfVars.TASK_DEFAULT_DISK); Becuase TASK_DEFAULT_MEMORY=512MB and TASK_DEFAULT_DISK=1.0, we still have the same default values as those of the previous configuration.
          Hide
          Jihoon Son added a comment -

          Thanks for your update.
          I have one more thing to discuss.
          When the dedication flag is not set, some of resources of workers are reserved for other processes.
          So, in my opinion, it will be better that when tajo.worker.resource.dedicated is false, the default values of memory and disk are set to 512MB per task and 1.0, respectively, as in the previous configuration.

          Show
          Jihoon Son added a comment - Thanks for your update. I have one more thing to discuss. When the dedication flag is not set, some of resources of workers are reserved for other processes. So, in my opinion, it will be better that when tajo.worker.resource.dedicated is false, the default values of memory and disk are set to 512MB per task and 1.0, respectively, as in the previous configuration.
          Keuntae Park made changes -
          Attachment TAJO-317_3.patch [ 12616220 ]
          Hide
          Keuntae Park added a comment -

          Sorry, you are right, Jihoon.
          I have another mistake on the patch description,
          true/false should be reversed

          • if tajo.worker.resource.dedicated = false,
            • memory: value from tajo.worker.resource.memory-mb (default is max heap of worker)
            • disk: value from tajo.worker.resource.disks (default is number of disks)
          • if tajo.worker.resoure.dedicated = true,
            • memory: tajo.worker.resource.dedicated-memory-ratio * max heap of worker
            • disk: number of disks

          And, I've uploaded new patch with clean code

          Show
          Keuntae Park added a comment - Sorry, you are right, Jihoon. I have another mistake on the patch description, true/false should be reversed if tajo.worker.resource.dedicated = false , memory: value from tajo.worker.resource.memory-mb (default is max heap of worker) disk: value from tajo.worker.resource.disks (default is number of disks) if tajo.worker.resoure.dedicated = true , memory: tajo.worker.resource.dedicated-memory-ratio * max heap of worker disk: number of disks And, I've uploaded new patch with clean code
          Hide
          Jihoon Son added a comment -

          I found that the implementation is contrary to your description.
          That is, you said that workers' memory is tajo.worker.resource.memory-mb when the dedication flag is set, but is actually tajo.worker.resource.dedicated-memory-ratio * max heap in the implementation.
          Please fix this difference.

          Also, there are some unused imports, commented out codes, and print codes for the debug.
          Please remove them.

          Show
          Jihoon Son added a comment - I found that the implementation is contrary to your description. That is, you said that workers' memory is tajo.worker.resource.memory-mb when the dedication flag is set, but is actually tajo.worker.resource.dedicated-memory-ratio * max heap in the implementation. Please fix this difference. Also, there are some unused imports, commented out codes, and print codes for the debug. Please remove them.
          Hide
          Jihoon Son added a comment -

          It's ok. Never mind.
          I'll review the patch.

          Show
          Jihoon Son added a comment - It's ok. Never mind. I'll review the patch.
          Keuntae Park made changes -
          Attachment TAJO-317_2.patch [ 12616186 ]
          Hide
          Keuntae Park added a comment -

          I must apologize to you, Jihoon.
          When I checked your comment, I found that I uploaded wrong file.
          I feel so sorry about that.

          Now, I've uploaded correct one, which is also rebased.
          If you don't mind, please review again.

          Again, I'm so sorry and thank you for your kind review.

          Show
          Keuntae Park added a comment - I must apologize to you, Jihoon. When I checked your comment, I found that I uploaded wrong file. I feel so sorry about that. Now, I've uploaded correct one, which is also rebased. If you don't mind, please review again. Again, I'm so sorry and thank you for your kind review.
          Hide
          Jihoon Son added a comment -

          Thanks for your contribution.
          There is a couple of things that need to be discussed.

          • As you mentioned, 'tajo.worker.parallel-execution.max-num' is no longer used. Please remove it.
          • The name of ResourceRequestType does not look to be proper for its purpose. As described in this issue, the main purpose of the request type is representing the priority. I think that ResourceRequestPriority is more suitable.
          • I have a doubt why a special resource request type for the query master, that is ResourceRequestType.QUERY_MASTER, is required. I think that it can be handled as a kind of MEMORY type.
          • In TajoWorkerResourceManager.chooseWorker(), the worker resource is not locked when the resource type is QUERY_MASTER or DISK. It will cause unexpected operations.
          • As described in this issue, when the resource request priority is MEMORY (or DISK), the required disk (or memory) resources should be reduced as well as memory (or disk). But, I can't find any codes for this operation.
          • As described in this issue, resource requests should contain both min and max values. But, I can't find these changes.
          • Since YarnTajoResourceManager does not work properly, it will be better to throw an UnimplementedException() when a function of it is called.
          Show
          Jihoon Son added a comment - Thanks for your contribution. There is a couple of things that need to be discussed. As you mentioned, 'tajo.worker.parallel-execution.max-num' is no longer used. Please remove it. The name of ResourceRequestType does not look to be proper for its purpose. As described in this issue, the main purpose of the request type is representing the priority. I think that ResourceRequestPriority is more suitable. I have a doubt why a special resource request type for the query master, that is ResourceRequestType.QUERY_MASTER, is required. I think that it can be handled as a kind of MEMORY type. In TajoWorkerResourceManager.chooseWorker(), the worker resource is not locked when the resource type is QUERY_MASTER or DISK. It will cause unexpected operations. As described in this issue, when the resource request priority is MEMORY (or DISK), the required disk (or memory) resources should be reduced as well as memory (or disk). But, I can't find any codes for this operation. As described in this issue, resource requests should contain both min and max values. But, I can't find these changes. Since YarnTajoResourceManager does not work properly, it will be better to throw an UnimplementedException() when a function of it is called.
          Hide
          Jihoon Son added a comment -

          I'll review this patch.

          Show
          Jihoon Son added a comment - I'll review this patch.
          Hide
          Keuntae Park added a comment -

          Would anyone review this patch, please?

          Show
          Keuntae Park added a comment - Would anyone review this patch, please?
          Keuntae Park made changes -
          Attachment TAJO-317.patch [ 12615291 ]
          Hide
          Keuntae Park added a comment - - edited

          I've uploaded the patch for the issue:

          Now, TajoResourceManager allocates container based on memory or disk slots

          1. At startup time, Worker sends its resource information to ResourceManager

          • tajo.worker.parallel-execution.max-num is no longer used
          • if tajo.worker.resource.dedicated = true,
            • memory: value from tajo.worker.resource.memory-mb (default is max heap of worker)
            • disk: value from tajo.worker.resource.disks (default is number of disks)
          • if tajo.worker.resoure.dedicated = false,
            • memory: tajo.worker.resource.dedicated-memory-ratio * max heap of worker
            • disk: number of disks
          • TajoResouceManager manages memory and disk as follows:
            • Memory is managed in MB: TajoResoueceManager allocates a worker based on min/max parameter specified in a container request of QueryMaster
              (Detailed allocation policy is explained in 3)
            • Disk is managed in slot: query master specifies the number of slots (decimal is OK) for a container in its request, then, TajoResourceManager allocated a worker that has enough slots. (Detailed allocation policy is explained in 3)

          2. QueryMaster requests container with following parameters:

          • ResourceRequestPriority(Memory or Disk)
          • min/max memory in MB
          • min/max disk slots

          3. Resource allocation policy of TajoResourceManager:

          • if ResourceRequestPriority = Memory,
            • First, find a worker whose available memory is more than 'max'
            • If not exist, find a worker that has more than 'min'
            • 'min/max disk slots' is used just for updating 'used disk slot' value of the selected worker
            • This mode is for memory-intensive tasks
          • if ResourceRequestPriority = Disk,
            • First, find a worker whose available disk slots is more than 'max'
            • If not exist, find a worker has more slots than 'min'
            • 'min/max memory MB' is only used for updating used memory value of the selected worker
            • For disk-intensive tasks
          • Logic to adjust requesting memory, disk size based on the task type is NOT YET IMPLEMENTED. For the compatibility with current allocation policy which considers memory only, every task is implemented to request 512MB memory.
            • For '1 worker = 10 task' concurrency, set as
              .tajo.worker.resource.dedicated=false
              .tajo.worker.resource.memory-mb=5120

          4. Test case
          TestTajoResourceManager

          Show
          Keuntae Park added a comment - - edited I've uploaded the patch for the issue: Now, TajoResourceManager allocates container based on memory or disk slots 1. At startup time, Worker sends its resource information to ResourceManager tajo.worker.parallel-execution.max-num is no longer used if tajo.worker.resource.dedicated = true, memory: value from tajo.worker.resource.memory-mb (default is max heap of worker) disk: value from tajo.worker.resource.disks (default is number of disks) if tajo.worker.resoure.dedicated = false, memory: tajo.worker.resource.dedicated-memory-ratio * max heap of worker disk: number of disks TajoResouceManager manages memory and disk as follows: Memory is managed in MB: TajoResoueceManager allocates a worker based on min/max parameter specified in a container request of QueryMaster (Detailed allocation policy is explained in 3) Disk is managed in slot: query master specifies the number of slots (decimal is OK) for a container in its request, then, TajoResourceManager allocated a worker that has enough slots. (Detailed allocation policy is explained in 3) 2. QueryMaster requests container with following parameters: ResourceRequestPriority(Memory or Disk) min/max memory in MB min/max disk slots 3. Resource allocation policy of TajoResourceManager: if ResourceRequestPriority = Memory, First, find a worker whose available memory is more than 'max' If not exist, find a worker that has more than 'min' 'min/max disk slots' is used just for updating 'used disk slot' value of the selected worker This mode is for memory-intensive tasks if ResourceRequestPriority = Disk, First, find a worker whose available disk slots is more than 'max' If not exist, find a worker has more slots than 'min' 'min/max memory MB' is only used for updating used memory value of the selected worker For disk-intensive tasks Logic to adjust requesting memory, disk size based on the task type is NOT YET IMPLEMENTED. For the compatibility with current allocation policy which considers memory only, every task is implemented to request 512MB memory. For '1 worker = 10 task' concurrency, set as .tajo.worker.resource.dedicated=false .tajo.worker.resource.memory-mb=5120 4. Test case TestTajoResourceManager
          Keuntae Park made changes -
          Assignee Keuntae Park [ sirpkt ]
          Hide
          Keuntae Park added a comment -

          Thank you, Hyunsik.
          I'll start this issue.

          Show
          Keuntae Park added a comment - Thank you, Hyunsik. I'll start this issue.
          Hide
          Hyunsik Choi added a comment -

          Keuntae Park You are already registered as a contributor. So, you can assign yourself. Thank you!

          Show
          Hyunsik Choi added a comment - Keuntae Park You are already registered as a contributor. So, you can assign yourself. Thank you!
          Hide
          Keuntae Park added a comment -

          If everyone agrees, I want to take on this issue.

          Show
          Keuntae Park added a comment - If everyone agrees, I want to take on this issue.
          Hide
          Jihoon Son added a comment -

          +1 for this issue.
          This will highly increase the efficiency of resource scheduling.

          Show
          Jihoon Son added a comment - +1 for this issue. This will highly increase the efficiency of resource scheduling.
          Hyunsik Choi made changes -
          Field Original Value New Value
          Description h4. Status of the current Tajo Resource Manager (RM)
           * Tajo RM manages CPU, DISK resource incompletely, and it only provides resource management through memory allocations.
           * In addition, Tajo RM considers the memory resource as the fixed number of slots.

          h4. Problem
          In many cases, workloads can be categorized into I/O intensive job and CPU and memory consuming job. For example, scan and hash partition or INSERT OVERWRITE may be belong to I/O intensive job. In general, Aggregation can be belong to CPU-memory consuming job. The current RM is not fit to support selectively I/O intensive job or CPU-memory consuming job because it provides only memory slots. We need more elaborate resource management mechanism.

          In addition, in most resource management systems, the remain resource less than required resource is not allocated in response to a resource request. It is not good to fully utilize the cluster resources. In order to mitigate this problem, we need to add resilience to allocation mechanism. For example, min-max request would be useful for it.

          h4. Proposal
           * Tajo RM should provides resource management for disk and cpu-memory.
           ** Tajo RM should provide allocation request call with min, max memory request, and min, max disk request.
           *** min-max request will be useful to fully utilize remain cluster resources.
           * Each resource request should have a priority. The priority can be disk or memory.
            ** If the priority is disk
            *** disk allocation will be limited depending on the remain disk resource
            *** memory allocation will be not limited regardless of the remain memory resource, and just reduce the remain memory resource.
            ** If the priority is memory
             *** memory allocation will be limited depending on the remain memory resource
             *** disk allocation will be not limited regardless of the remain disk resource, and just reduce the remain disk resource.
           * disk resource in each worker is represented as a float value.
           ** The initial disk resource will be the number of disks which participate in HDFS data directory.
          h3. Status of the current Tajo Resource Manager (RM)
           * Tajo RM manages CPU, DISK resource incompletely, and it only provides resource management through memory allocations.
           * In addition, Tajo RM considers the memory resource as the fixed number of slots.

          h3. Problem
          In many cases, workloads can be categorized into I/O intensive job and CPU and memory consuming job. For example, scan and hash partition or INSERT OVERWRITE may be belong to I/O intensive job. In general, Aggregation can be belong to CPU-memory consuming job. The current RM is not fit to support selectively I/O intensive job or CPU-memory consuming job because it provides only memory slots. We need more elaborate resource management mechanism.

          In addition, in most resource management systems, the remain resource less than required resource is not allocated in response to a resource request. It is not good to fully utilize the cluster resources. In order to mitigate this problem, we need to add resilience to allocation mechanism. For example, min-max request would be useful for it.

          h3. Proposal
           * Tajo RM should provides resource management for disk and cpu-memory.
           ** Tajo RM should provide allocation request call with min, max memory request, and min, max disk request.
           *** min-max request will be useful to fully utilize remain cluster resources.
           * Each resource request should have a priority. The priority can be disk or memory.
            ** If the priority is disk
            *** disk allocation will be limited depending on the remain disk resource
            *** memory allocation will be not limited regardless of the remain memory resource, and just reduce the remain memory resource.
            ** If the priority is memory
             *** memory allocation will be limited depending on the remain memory resource
             *** disk allocation will be not limited regardless of the remain disk resource, and just reduce the remain disk resource.
           * disk resource in each worker is represented as a float value.
           ** The initial disk resource will be the number of disks which participate in HDFS data directory.
          Hyunsik Choi created issue -

            People

            • Assignee:
              Keuntae Park
              Reporter:
              Hyunsik Choi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development