Hadoop Common
  1. Hadoop Common
  2. HADOOP-8468

Umbrella of enhancements to support different failure and locality topologies

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.0.0, 2.0.0-alpha
    • Fix Version/s: 1.2.0, 2.1.0-beta
    • Component/s: ha, io
    • Labels:
      None

      Description

      The current hadoop network topology (described in some previous issues like: Hadoop-692) works well in classic three-tiers network when it comes out. However, it does not take into account other failure models or changes in the infrastructure that can affect network bandwidth efficiency like: virtualization.
      Virtualized platform has following genes that shouldn't been ignored by hadoop topology in scheduling tasks, placing replica, do balancing or fetching block for reading:
      1. VMs on the same physical host are affected by the same hardware failure. In order to match the reliability of a physical deployment, replication of data across two virtual machines on the same host should be avoided.
      2. The network between VMs on the same physical host has higher throughput and lower latency and does not consume any physical switch bandwidth.
      Thus, we propose to make hadoop network topology extend-able and introduce a new level in the hierarchical topology, a node group level, which maps well onto an infrastructure that is based on a virtualized environment.

      1. HVE User Guide on branch-1(draft ).pdf
        380 kB
        Junping Du
      2. HVE_Hadoop World Meetup 2012.pptx
        954 kB
        Junping Du
      3. Proposal for enchanced failure and locality topologies (revised-1.0).pdf
        269 kB
        Junping Du
      4. HADOOP-8468-total-v3.patch
        259 kB
        Junping Du
      5. HADOOP-8468-total.patch
        259 kB
        Junping Du
      6. Proposal for enchanced failure and locality topologies.pdf
        260 kB
        Junping Du

        Issue Links

          Activity

          Hide
          Junping Du added a comment -

          Great. Thanks!

          Show
          Junping Du added a comment - Great. Thanks!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Thanks Junping, I will review and merge the patches to branch-2.

          Show
          Tsz Wo Nicholas Sze added a comment - Thanks Junping, I will review and merge the patches to branch-2.
          Hide
          Sheng Liu added a comment -

          Thank you for your work, Junping

          Show
          Sheng Liu added a comment - Thank you for your work, Junping
          Hide
          Junping Du added a comment -

          All JIRAs above are reopened and patches already available for branch-2. Please help to review and commit. Thanks!

          Show
          Junping Du added a comment - All JIRAs above are reopened and patches already available for branch-2. Please help to review and commit. Thanks!
          Hide
          Junping Du added a comment -

          I start the work to backport NodeGroup-aware topology to branch-2, there are 7 patches below in Common & HDFS:
          patch 1 - HADOOP-8469: make NetworkTopology pluggable
          patch 2 - HADOOP-8470: implementation of NetworkTopologyWithNodeGroup
          patch 3 - HDFS-3498: Make BlockPlacementPolicyWithDefault extensible and Block Placement Removal policy extensible
          patch 4 - HDFS-3601: implementation of BlockPlacementPolicyWithNodeGroup
          patch 5 - HDFS-4240: fix a bug in nodegroup-aware case
          patch 6 - HDFS-3495: Make Balancer support node group layer
          patch 7 - HDFS-4261: Fix a timeout issue in TestBalancerWithNodeGroup case

          Show
          Junping Du added a comment - I start the work to backport NodeGroup-aware topology to branch-2, there are 7 patches below in Common & HDFS: patch 1 - HADOOP-8469 : make NetworkTopology pluggable patch 2 - HADOOP-8470 : implementation of NetworkTopologyWithNodeGroup patch 3 - HDFS-3498 : Make BlockPlacementPolicyWithDefault extensible and Block Placement Removal policy extensible patch 4 - HDFS-3601 : implementation of BlockPlacementPolicyWithNodeGroup patch 5 - HDFS-4240 : fix a bug in nodegroup-aware case patch 6 - HDFS-3495 : Make Balancer support node group layer patch 7 - HDFS-4261 : Fix a timeout issue in TestBalancerWithNodeGroup case
          Hide
          Jan Kunigk added a comment -

          Junping, thank you your explanations. Beyond separating compute and storage of a virtual cluster, can you comment on isolation? It sounds like you would use a multitude of virtual HDFS's in order to fence off virtual clusters among each other.

          Show
          Jan Kunigk added a comment - Junping, thank you your explanations. Beyond separating compute and storage of a virtual cluster, can you comment on isolation? It sounds like you would use a multitude of virtual HDFS's in order to fence off virtual clusters among each other.
          Hide
          Junping Du added a comment -

          Jan, Thanks for the questions. It doesn't have to be multiple clusters and each with dedicated HDFS. It also make sense you setup some purely compute-only clusters that based on the same HDFS cluster by separating TaskTracker(or NodeManager) and DataNode into different VMs. The NodeGroup-awareness here will help to guarantee nodeGroup-level (physical host) locality. So you can power off/suspend your compute-cluster without any affection on other clusters. Given this, you don't have to suspend your HDFS cluster for saving resources for other applications.
          In other case, if you want to suspend a virtual cluster (with HDFS also), I would recommend you to stop HDFS service before you suspend your cluster and start again after you resume the cluster. It helps to get rid of data re-replication caused by DNs' heartbeat outage, and there is no need for extra storage tier for persistence.

          Show
          Junping Du added a comment - Jan, Thanks for the questions. It doesn't have to be multiple clusters and each with dedicated HDFS. It also make sense you setup some purely compute-only clusters that based on the same HDFS cluster by separating TaskTracker(or NodeManager) and DataNode into different VMs. The NodeGroup-awareness here will help to guarantee nodeGroup-level (physical host) locality. So you can power off/suspend your compute-cluster without any affection on other clusters. Given this, you don't have to suspend your HDFS cluster for saving resources for other applications. In other case, if you want to suspend a virtual cluster (with HDFS also), I would recommend you to stop HDFS service before you suspend your cluster and start again after you resume the cluster. It helps to get rid of data re-replication caused by DNs' heartbeat outage, and there is no need for extra storage tier for persistence.
          Hide
          Jan Kunigk added a comment -

          Junping,

          Referring to one of your earlier comments on 08/Jun/12:
          > For 2. It's right that VMs on the same host will not share storage directly
          > but could do so (with getting virtual disks) through Hypervisor FS (Like VMFS in VMware vSphere) layer.
          > Another way (should recommend for hadoop case) is to go through RDM (Raw Disk Mapping) configuration
          > in hypervisor that each VM can get some dedicated physical disks.

          Are you envisioning a usage model where each virtual cluster has its own distributed filesystem ?
          When I use virtualization I would most likely suspend my virtual clusters from time to time...
          Can you comment on what would happen to the HDFS data in this case, would one have to persist it in a different storage tier?

          Show
          Jan Kunigk added a comment - Junping, Referring to one of your earlier comments on 08/Jun/12: > For 2. It's right that VMs on the same host will not share storage directly > but could do so (with getting virtual disks) through Hypervisor FS (Like VMFS in VMware vSphere) layer. > Another way (should recommend for hadoop case) is to go through RDM (Raw Disk Mapping) configuration > in hypervisor that each VM can get some dedicated physical disks. Are you envisioning a usage model where each virtual cluster has its own distributed filesystem ? When I use virtualization I would most likely suspend my virtual clusters from time to time... Can you comment on what would happen to the HDFS data in this case, would one have to persist it in a different storage tier?
          Hide
          Junping Du added a comment -

          Andy, you can just simply start TaskTracker daemon on some nodes while start DataNode daemon on other nodes. If you need to follow up with detail on how to do it, please send mail to me or user@hadoop.apache.org as that seems not related to this jira.

          Show
          Junping Du added a comment - Andy, you can just simply start TaskTracker daemon on some nodes while start DataNode daemon on other nodes. If you need to follow up with detail on how to do it, please send mail to me or user@hadoop.apache.org as that seems not related to this jira.
          Hide
          Andy Yang added a comment -

          Could anyone give me some advice about how to separate tasktracker form datanode?I'll be very appreciated!

          Show
          Andy Yang added a comment - Could anyone give me some advice about how to separate tasktracker form datanode?I'll be very appreciated!
          Hide
          Andy Yang added a comment -

          Thank you, Junping Du!

          Show
          Andy Yang added a comment - Thank you, Junping Du!
          Hide
          Junping Du added a comment -

          Hi Andy, HADOOP-8468-total.patch is a patch which is out of date. We already divide it into several patches and most of them are checked in trunk now (except YARN-18 and YARN-19). If you are interested in backport it to other branches, I would suggest to go sub-jira patches.

          Show
          Junping Du added a comment - Hi Andy, HADOOP-8468 -total.patch is a patch which is out of date. We already divide it into several patches and most of them are checked in trunk now (except YARN-18 and YARN-19 ). If you are interested in backport it to other branches, I would suggest to go sub-jira patches.
          Hide
          Andy Yang added a comment -

          I try to patch the HADOOP-8468-total.patch to the release fro site http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.2-alpha/,but failed!Cloud anyone give me some suggestion?The errors are as follow:
          andy@andy:~/hadoop/hadoop-2.0.2-alpha-src$ patch -p0 < HADOOP-8468-total.patch patching file hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java
          patching file hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/InnerNode.java
          patching file hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/InnerNodeWithNodeGroup.java
          patching file hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java
          Hunk #9 succeeded at 461 (offset 1 line).
          Hunk #10 succeeded at 517 (offset 1 line).
          patching file hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
          patching file hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/TopologyResolver.java
          patching file hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          Hunk #1 succeeded at 611 (offset 11 lines).
          Hunk #2 succeeded at 631 (offset 11 lines).
          patching file hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeInfo.java
          Hunk #1 succeeded at 201 (offset 2 lines).
          patching file hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
          Hunk #1 succeeded at 53 (offset 4 lines).
          Hunk #2 succeeded at 77 with fuzz 2 (offset 5 lines).
          Hunk #3 succeeded at 806 (offset 26 lines).
          Hunk #4 succeeded at 920 (offset 26 lines).
          Hunk #5 succeeded at 941 (offset 26 lines).
          Hunk #6 succeeded at 1146 (offset 26 lines).
          patching file hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
          Hunk #1 succeeded at 57 with fuzz 2 (offset 1 line).
          Hunk #2 succeeded at 89 (offset 1 line).
          Hunk #3 succeeded at 230 (offset 1 line).
          Hunk #4 succeeded at 264 (offset 1 line).
          Hunk #5 succeeded at 317 (offset 1 line).
          Hunk #6 succeeded at 339 (offset 1 line).
          Hunk #7 succeeded at 383 (offset 1 line).
          Hunk #8 FAILED at 438.
          1 out of 8 hunks FAILED – saving rejects to file hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java.rej
          patching file hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java
          patching file hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          Hunk #2 succeeded at 66 (offset -1 lines).
          Hunk #5 succeeded at 221 (offset 6 lines).
          patching file hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
          Hunk #1 succeeded at 309 (offset 27 lines).
          Hunk #2 succeeded at 353 (offset 30 lines).
          Hunk #3 succeeded at 2240 (offset 78 lines).
          patching file hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSClusterWithNodeGroup.java
          patching file hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithNodeGroup.java
          patching file hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java
          patching file hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          patching file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
          Hunk #2 succeeded at 97 (offset 2 lines).
          Hunk #3 succeeded at 941 (offset 18 lines).
          Hunk #4 succeeded at 1163 (offset 30 lines).
          patching file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/ContainerRequestWithNodeGroupEvent.java
          patching file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
          Hunk #3 FAILED at 44.
          Hunk #4 succeeded at 63 with fuzz 2 (offset 5 lines).
          Hunk #5 succeeded at 78 (offset 5 lines).
          Hunk #6 FAILED at 112.
          Hunk #7 succeeded at 159 (offset 6 lines).
          Hunk #8 succeeded at 553 (offset 14 lines).
          Hunk #9 FAILED at 624.
          3 out of 9 hunks FAILED – saving rejects to file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java.rej
          patching file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/ScheduledRequests.java
          patching file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/ScheduledRequestsWithNodeGroup.java
          patching file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java
          Hunk #1 succeeded at 30 with fuzz 1 (offset 8 lines).
          Hunk #2 succeeded at 45 (offset 10 lines).
          Hunk #3 FAILED at 54.
          Hunk #4 succeeded at 96 with fuzz 2 (offset 12 lines).
          Hunk #5 succeeded at 334 (offset 12 lines).
          Hunk #6 succeeded at 1031 (offset 95 lines).
          Hunk #7 succeeded at 1043 (offset 95 lines).
          Hunk #8 succeeded at 1052 (offset 95 lines).
          Hunk #9 succeeded at 1236 (offset 95 lines).
          Hunk #10 succeeded at 1246 (offset 95 lines).
          Hunk #11 succeeded at 1264 (offset 95 lines).
          Hunk #12 succeeded at 1295 (offset 95 lines).
          Hunk #13 succeeded at 1321 (offset 95 lines).
          Hunk #14 succeeded at 1424 (offset 107 lines).
          Hunk #15 succeeded at 1470 (offset 107 lines).
          Hunk #16 succeeded at 1508 (offset 107 lines).
          Hunk #17 succeeded at 1674 (offset 122 lines).
          Hunk #18 succeeded at 1689 (offset 122 lines).
          Hunk #19 FAILED at 1582.
          Hunk #20 succeeded at 1726 (offset 134 lines).
          Hunk #21 FAILED at 1617.
          3 out of 21 hunks FAILED – saving rejects to file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java.rej
          patching file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobCounter.java
          can't find file to patch at input line 4949
          Perhaps you used the wrong -p or --strip option?
          The text leading up to this was:
          --------------------------

          diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          index 26ef90d..9163bf8 100644
          — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          --------------------------
          File to patch: hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
          hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java: No such file or directory
          Skip this patch? [y] n
          File to patch:
          Skip this patch? [y] y
          Skipping patch.
          1 out of 1 hunk ignored
          can't find file to patch at input line 4972
          Perhaps you used the wrong -p or --strip option?
          The text leading up to this was:
          --------------------------
          diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          index 6e8df10..8503344 100644
          — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
          --------------------------
          File to patch:
          Skip this patch? [y]
          Skipping patch.
          1 out of 1 hunk ignored
          patching file hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestTopologyResolver.java
          can't find file to patch at input line 5059
          Perhaps you used the wrong -p or --strip option?
          The text leading up to this was:
          --------------------------
          diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
          index 06ad11f..e3c9faa 100644
          — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
          +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
          --------------------------
          File to patch:
          Skip this patch? [y]
          Skipping patch.
          3 out of 3 hunks ignored
          can't find file to patch at input line 5105
          Perhaps you used the wrong -p or --strip option?
          The text leading up to this was:
          --------------------------
          diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
          index 9a926e0..99cf612 100644
          — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
          +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
          --------------------------
          File to patch:
          Skip this patch? [y]
          Skipping patch.
          1 out of 1 hunk ignored
          patching file hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImplWithNodeGroup.java
          can't find file to patch at input line 5168
          Perhaps you used the wrong -p or --strip option?
          The text leading up to this was:
          --------------------------
          diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
          index 5d11e52..cc13dc0 100644
          — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
          +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
          --------------------------
          File to patch:
          Skip this patch? [y]
          Skipping patch.
          4 out of 4 hunks ignored
          can't find file to patch at input line 5257
          Perhaps you used the wrong -p or --strip option?
          The text leading up to this was:
          --------------------------
          diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java
          index 821ec24..bad5ca0 100644
          — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java
          +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java
          --------------------------
          File to patch:
          Skip this patch? [y]
          Skipping patch.
          1 out of 1 hunk ignored
          can't find file to patch at input line 5269
          Perhaps you used the wrong -p or --strip option?
          The text leading up to this was:
          --------------------------
          diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
          index 7e51841..5af4a1d 100644
          — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
          +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
          --------------------------
          File to patch:
          Skip this patch? [y]
          Skipping patch.
          2 out of 2 hunks ignored
          patching file hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNodeWithNodeGroup.java
          can't find file to patch at input line 5357
          Perhaps you used the wrong -p or --strip option?
          The text leading up to this was:
          --------------------------
          diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
          index f304b0a..f0d75ad 100644
          — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
          +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
          --------------------------
          File to patch:
          Skip this patch? [y]
          Skipping patch.
          3 out of 3 hunks ignored
          can't find file to patch at input line 5413
          Perhaps you used the wrong -p or --strip option?
          The text leading up to this was:
          --------------------------
          diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
          index 75d5249..cfc17a9 100644
          — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
          +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
          --------------------------
          File to patch:
          Skip this patch? [y]
          Skipping patch.
          10 out of 10 hunks ignored
          patching file hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueueWithNodeGroup.java
          can't find file to patch at input line 5635
          Perhaps you used the wrong -p or --strip option?
          The text leading up to this was:
          --------------------------
          diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
          index a33a37d..66934d9 100644
          — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
          +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
          --------------------------
          File to patch:
          Skip this patch? [y]
          Skipping patch.
          12 out of 12 hunks ignored
          patching file hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoSchedulerWithNodeGroup.java
          patching file hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueueWithNodeGroup.java
          can't find file to patch at input line 6174
          Perhaps you used the wrong -p or --strip option?
          The text leading up to this was:
          --------------------------
          diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
          index 5dc6dfb..29533f0 100644
          — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
          +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
          --------------------------
          File to patch:
          Skip this patch? [y]
          Skipping patch.
          3 out of 3 hunks ignored
          Show
          Andy Yang added a comment - I try to patch the HADOOP-8468 -total.patch to the release fro site http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.2-alpha/,but failed!Cloud anyone give me some suggestion?The errors are as follow: andy@andy:~/hadoop/hadoop-2.0.2-alpha-src$ patch -p0 < HADOOP-8468 -total.patch patching file hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java patching file hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/InnerNode.java patching file hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/InnerNodeWithNodeGroup.java patching file hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopology.java Hunk #9 succeeded at 461 (offset 1 line). Hunk #10 succeeded at 517 (offset 1 line). patching file hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java patching file hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/TopologyResolver.java patching file hadoop-common-project/hadoop-common/src/main/resources/core-default.xml Hunk #1 succeeded at 611 (offset 11 lines). Hunk #2 succeeded at 631 (offset 11 lines). patching file hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeInfo.java Hunk #1 succeeded at 201 (offset 2 lines). patching file hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java Hunk #1 succeeded at 53 (offset 4 lines). Hunk #2 succeeded at 77 with fuzz 2 (offset 5 lines). Hunk #3 succeeded at 806 (offset 26 lines). Hunk #4 succeeded at 920 (offset 26 lines). Hunk #5 succeeded at 941 (offset 26 lines). Hunk #6 succeeded at 1146 (offset 26 lines). patching file hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java Hunk #1 succeeded at 57 with fuzz 2 (offset 1 line). Hunk #2 succeeded at 89 (offset 1 line). Hunk #3 succeeded at 230 (offset 1 line). Hunk #4 succeeded at 264 (offset 1 line). Hunk #5 succeeded at 317 (offset 1 line). Hunk #6 succeeded at 339 (offset 1 line). Hunk #7 succeeded at 383 (offset 1 line). Hunk #8 FAILED at 438. 1 out of 8 hunks FAILED – saving rejects to file hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java.rej patching file hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java patching file hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java Hunk #2 succeeded at 66 (offset -1 lines). Hunk #5 succeeded at 221 (offset 6 lines). patching file hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java Hunk #1 succeeded at 309 (offset 27 lines). Hunk #2 succeeded at 353 (offset 30 lines). Hunk #3 succeeded at 2240 (offset 78 lines). patching file hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSClusterWithNodeGroup.java patching file hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithNodeGroup.java patching file hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java patching file hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java patching file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java Hunk #2 succeeded at 97 (offset 2 lines). Hunk #3 succeeded at 941 (offset 18 lines). Hunk #4 succeeded at 1163 (offset 30 lines). patching file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/ContainerRequestWithNodeGroupEvent.java patching file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java Hunk #3 FAILED at 44. Hunk #4 succeeded at 63 with fuzz 2 (offset 5 lines). Hunk #5 succeeded at 78 (offset 5 lines). Hunk #6 FAILED at 112. Hunk #7 succeeded at 159 (offset 6 lines). Hunk #8 succeeded at 553 (offset 14 lines). Hunk #9 FAILED at 624. 3 out of 9 hunks FAILED – saving rejects to file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java.rej patching file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/ScheduledRequests.java patching file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/ScheduledRequestsWithNodeGroup.java patching file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java Hunk #1 succeeded at 30 with fuzz 1 (offset 8 lines). Hunk #2 succeeded at 45 (offset 10 lines). Hunk #3 FAILED at 54. Hunk #4 succeeded at 96 with fuzz 2 (offset 12 lines). Hunk #5 succeeded at 334 (offset 12 lines). Hunk #6 succeeded at 1031 (offset 95 lines). Hunk #7 succeeded at 1043 (offset 95 lines). Hunk #8 succeeded at 1052 (offset 95 lines). Hunk #9 succeeded at 1236 (offset 95 lines). Hunk #10 succeeded at 1246 (offset 95 lines). Hunk #11 succeeded at 1264 (offset 95 lines). Hunk #12 succeeded at 1295 (offset 95 lines). Hunk #13 succeeded at 1321 (offset 95 lines). Hunk #14 succeeded at 1424 (offset 107 lines). Hunk #15 succeeded at 1470 (offset 107 lines). Hunk #16 succeeded at 1508 (offset 107 lines). Hunk #17 succeeded at 1674 (offset 122 lines). Hunk #18 succeeded at 1689 (offset 122 lines). Hunk #19 FAILED at 1582. Hunk #20 succeeded at 1726 (offset 134 lines). Hunk #21 FAILED at 1617. 3 out of 21 hunks FAILED – saving rejects to file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java.rej patching file hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobCounter.java can't find file to patch at input line 4949 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java index 26ef90d..9163bf8 100644 — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java -------------------------- File to patch: hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java: No such file or directory Skip this patch? [y] n File to patch: Skip this patch? [y] y Skipping patch. 1 out of 1 hunk ignored can't find file to patch at input line 4972 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml index 6e8df10..8503344 100644 — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml -------------------------- File to patch: Skip this patch? [y] Skipping patch. 1 out of 1 hunk ignored patching file hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestTopologyResolver.java can't find file to patch at input line 5059 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java index 06ad11f..e3c9faa 100644 — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java -------------------------- File to patch: Skip this patch? [y] Skipping patch. 3 out of 3 hunks ignored can't find file to patch at input line 5105 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java index 9a926e0..99cf612 100644 — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java -------------------------- File to patch: Skip this patch? [y] Skipping patch. 1 out of 1 hunk ignored patching file hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImplWithNodeGroup.java can't find file to patch at input line 5168 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java index 5d11e52..cc13dc0 100644 — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java -------------------------- File to patch: Skip this patch? [y] Skipping patch. 4 out of 4 hunks ignored can't find file to patch at input line 5257 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java index 821ec24..bad5ca0 100644 — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/NodeType.java -------------------------- File to patch: Skip this patch? [y] Skipping patch. 1 out of 1 hunk ignored can't find file to patch at input line 5269 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java index 7e51841..5af4a1d 100644 — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java -------------------------- File to patch: Skip this patch? [y] Skipping patch. 2 out of 2 hunks ignored patching file hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNodeWithNodeGroup.java can't find file to patch at input line 5357 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java index f304b0a..f0d75ad 100644 — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java -------------------------- File to patch: Skip this patch? [y] Skipping patch. 3 out of 3 hunks ignored can't find file to patch at input line 5413 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java index 75d5249..cfc17a9 100644 — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java -------------------------- File to patch: Skip this patch? [y] Skipping patch. 10 out of 10 hunks ignored patching file hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueueWithNodeGroup.java can't find file to patch at input line 5635 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java index a33a37d..66934d9 100644 — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java -------------------------- File to patch: Skip this patch? [y] Skipping patch. 12 out of 12 hunks ignored patching file hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoSchedulerWithNodeGroup.java patching file hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueueWithNodeGroup.java can't find file to patch at input line 6174 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- diff --git hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java index 5dc6dfb..29533f0 100644 — hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java +++ hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java -------------------------- File to patch: Skip this patch? [y] Skipping patch. 3 out of 3 hunks ignored
          Hide
          Junping Du added a comment -

          Hi Konstantin, Thanks for your question and carefully reading on the result. Yes. TestDFSIO has no locality awareness in task scheduling as you said. However, after tasks are scheduled, the work in this umbrella (let's call it HVE for short) can enhance the possibility for client to choose local physical host's data block for two reasons:
          1. HVE make sure all replica cross 3 physical hosts (total 6 hosts), so for any HDFS read, there is 50% chance to have a replica living on the same physical host (previously, it is between 1/3 - 1/2)
          2. With HVE, HDFS client can correctly sort the replicas to have nodegroup-local replica have priority to be chosen rather than rack-local replica.
          The first reason is just special for this case, but second reason affects more general.
          Is that make sense?

          Show
          Junping Du added a comment - Hi Konstantin, Thanks for your question and carefully reading on the result. Yes. TestDFSIO has no locality awareness in task scheduling as you said. However, after tasks are scheduled, the work in this umbrella (let's call it HVE for short) can enhance the possibility for client to choose local physical host's data block for two reasons: 1. HVE make sure all replica cross 3 physical hosts (total 6 hosts), so for any HDFS read, there is 50% chance to have a replica living on the same physical host (previously, it is between 1/3 - 1/2) 2. With HVE, HDFS client can correctly sort the replicas to have nodegroup-local replica have priority to be chosen rather than rack-local replica. The first reason is just special for this case, but second reason affects more general. Is that make sense?
          Hide
          Konstantin Shvachko added a comment -

          Junping,
          Checked your article with performance results. Got a question about it.
          How do you explain the performance gain with DFSIO?
          MapReduce-wise DFSIO is completely unaware of the locality of the data it reads, because input data is just the file with the file name that the mapper should read. So the input file with name of the file to read is local to task, but not the file that it then reads.
          Not saying there is anything wrong with your results, I just think it needs more explanation.

          Show
          Konstantin Shvachko added a comment - Junping, Checked your article with performance results. Got a question about it. How do you explain the performance gain with DFSIO? MapReduce-wise DFSIO is completely unaware of the locality of the data it reads, because input data is just the file with the file name that the mapper should read. So the input file with name of the file to read is local to task, but not the file that it then reads. Not saying there is anything wrong with your results, I just think it needs more explanation.
          Hide
          Junping Du added a comment -

          Link to a bug fix for boundary case in replica placement with nodegroup.

          Show
          Junping Du added a comment - Link to a bug fix for boundary case in replica placement with nodegroup.
          Hide
          Junping Du added a comment -

          Great!Thanks.Nicholas!

          Show
          Junping Du added a comment - Great!Thanks.Nicholas!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Junping, I would be happy to check your branch-1 patch.

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Junping, I would be happy to check your branch-1 patch.
          Hide
          Junping Du added a comment -

          Hi, Anyone can take a look at recent patch of topology extension for branch-1? I update a new version in HADOOP-8817.

          Show
          Junping Du added a comment - Hi, Anyone can take a look at recent patch of topology extension for branch-1? I update a new version in HADOOP-8817 .
          Hide
          Junping Du added a comment -

          Also, a white paper on reliability and performance evaluation for HVE: http://serengeti.cloudfoundry.com/pdf/Hadoop%20Virtualization%20Extensions%20WP.pdf .

          Show
          Junping Du added a comment - Also, a white paper on reliability and performance evaluation for HVE: http://serengeti.cloudfoundry.com/pdf/Hadoop%20Virtualization%20Extensions%20WP.pdf .
          Hide
          Junping Du added a comment -

          As some followup with meetup, I quickly summarize how to configure and use HVE as a draft user guide and put it attached. Please help to review and comments.

          Show
          Junping Du added a comment - As some followup with meetup, I quickly summarize how to configure and use HVE as a draft user guide and put it attached. Please help to review and comments.
          Hide
          Junping Du added a comment -

          I just attach the silde of talk on yesterday's Hadoop World meetup. Thanks.

          Show
          Junping Du added a comment - I just attach the silde of talk on yesterday's Hadoop World meetup. Thanks.
          Hide
          Junping Du added a comment -

          Thanks for great comments. Konstantin. the doc revised-1.0 already address full policy definition.
          Hi guys, I am back porting patches to branch-1. Hope I can get your support and help on reviewing.

          Show
          Junping Du added a comment - Thanks for great comments. Konstantin. the doc revised-1.0 already address full policy definition. Hi guys, I am back porting patches to branch-1. Hope I can get your support and help on reviewing.
          Hide
          Konstantin Shvachko added a comment -

          It's good that you formulated the policies. Now I can see the differences. In way-2 you actually don't need to say "virtual node". It is the implementation detail. You only care that the first replica is on the local physical node. So way-2 is the same as the original.
          In way-1 I agree only one change is needed. Rather surprising.

          I briefly checked the patch, and see now that your abstractions are driven by the implementation. Whether you define it way-1 or way-2 implementation-wise you still introduce a new inner level in the topology.
          I do not think you need the new class InnerNodeWithNodeGroup. It doesn't have new members or constructors. It overrides isRack(), but only because the old implementation assumed racks are on the second level. I'd rather add nodeType member than checking children of children.

          So, I think I understand your motivation with the design. Thanks for clarifying your thoughts to me. I still think that the terminology is better when talking about extending the topology with new leaves, but your way is also valid and does not change the policy much. You choose. Either way, please add the full policy definition in the document.

          Show
          Konstantin Shvachko added a comment - It's good that you formulated the policies. Now I can see the differences. In way-2 you actually don't need to say "virtual node". It is the implementation detail. You only care that the first replica is on the local physical node. So way-2 is the same as the original. In way-1 I agree only one change is needed. Rather surprising. I briefly checked the patch, and see now that your abstractions are driven by the implementation. Whether you define it way-1 or way-2 implementation-wise you still introduce a new inner level in the topology. I do not think you need the new class InnerNodeWithNodeGroup. It doesn't have new members or constructors. It overrides isRack(), but only because the old implementation assumed racks are on the second level. I'd rather add nodeType member than checking children of children. So, I think I understand your motivation with the design. Thanks for clarifying your thoughts to me. I still think that the terminology is better when talking about extending the topology with new leaves, but your way is also valid and does not change the policy much. You choose. Either way, please add the full policy definition in the document.
          Hide
          Junping Du added a comment -

          > What changed your mind? Sounds like the right direction to me.
          From above comments, you can see way-1 inherit original policy almost as much as way-2. But way-1 will take more simplicity in implementation for some reasons like: DatanodeDescriptor don't have to remap to additional virtual node layer, NetworkTopology structure is easier to extend in InnerNode rather than leaf node, etc. Thoughts?

          Show
          Junping Du added a comment - > What changed your mind? Sounds like the right direction to me. From above comments, you can see way-1 inherit original policy almost as much as way-2. But way-1 will take more simplicity in implementation for some reasons like: DatanodeDescriptor don't have to remap to additional virtual node layer, NetworkTopology structure is easier to extend in InnerNode rather than leaf node, etc. Thoughts?
          Hide
          Junping Du added a comment -

          Hi Konstantin,
          Thanks for your comments. Please see my reply:

          > If you put it in terms when virtual nodes are added as the fourth level, then you don't need to change a word in the old policy.
          Still need some slightly change as first replica should be placed on local virtual node but not node local. Let me show a two different way of translation the original rules you list above (in rule 2, I omit "on two different nodes" there as it is duplicated with rule 0).
          Original:
          0. No more than one replica is placed at any one node
          1. First replica on the local node
          2. Second and third replicas are in the same rack
          3. Other replicas on random nodes with restriction that no more than two replicas are placed in the same rack, if there is enough racks.

          two ways: 1) node, rack -> node, nodegroup; 2) node, rack -> virtual node, node, rack. The black word represent additional layer.
          way 1:
          0. No more than one replica is placed at any one nodegroup
          1. First replica on the local node
          2. Second and third replicas are in the same rack
          3. Other replicas on random nodes with restriction that no more than two replicas are placed in the same rack, if there is enough racks
          way 2:
          0. No more than one replica is placed at any one node
          1. First replica on the local virtual node
          2. Second and third replicas are in the same rack
          3. Other replicas on random nodes with restriction that no more than two replicas are placed in the same rack, if there is enough racks

          So you can see it is equivalent in words.

          Show
          Junping Du added a comment - Hi Konstantin, Thanks for your comments. Please see my reply: > If you put it in terms when virtual nodes are added as the fourth level, then you don't need to change a word in the old policy. Still need some slightly change as first replica should be placed on local virtual node but not node local. Let me show a two different way of translation the original rules you list above (in rule 2, I omit "on two different nodes" there as it is duplicated with rule 0). Original: 0. No more than one replica is placed at any one node 1. First replica on the local node 2. Second and third replicas are in the same rack 3. Other replicas on random nodes with restriction that no more than two replicas are placed in the same rack, if there is enough racks. two ways: 1) node, rack -> node, nodegroup ; 2) node, rack -> virtual node , node, rack. The black word represent additional layer. way 1: 0. No more than one replica is placed at any one nodegroup 1. First replica on the local node 2. Second and third replicas are in the same rack 3. Other replicas on random nodes with restriction that no more than two replicas are placed in the same rack, if there is enough racks way 2: 0. No more than one replica is placed at any one node 1. First replica on the local virtual node 2. Second and third replicas are in the same rack 3. Other replicas on random nodes with restriction that no more than two replicas are placed in the same rack, if there is enough racks So you can see it is equivalent in words.
          Hide
          Konstantin Shvachko added a comment -

          > 3rd on local node of 2nd

          How so?

          Junping try to rewrite the policy I stated earlier using your terms for 4-level topology with node-groups as the third level, and you will see many words change. If you put it in terms when virtual nodes are added as the fourth level, then you don't need to change a word in the old policy. I thought it's a good thing to keep old policies consistent with new use cases. Confirms (1) that it's a good policy, and (2) that it's a good design.

          > Agree. That's what I try to do previously also.

          What changed your mind? Sounds like the right direction to me.

          Show
          Konstantin Shvachko added a comment - > 3rd on local node of 2nd How so? Junping try to rewrite the policy I stated earlier using your terms for 4-level topology with node-groups as the third level, and you will see many words change. If you put it in terms when virtual nodes are added as the fourth level, then you don't need to change a word in the old policy. I thought it's a good thing to keep old policies consistent with new use cases. Confirms (1) that it's a good policy, and (2) that it's a good design. > Agree. That's what I try to do previously also. What changed your mind? Sounds like the right direction to me.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1114 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1114/)
          Revert r1351163 for fixing the JIRA number; it should be HADOOP-8470 but not HADOOP-8468. (Revision 1351444)

          Result = FAILURE
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351444
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1114 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1114/ ) Revert r1351163 for fixing the JIRA number; it should be HADOOP-8470 but not HADOOP-8468 . (Revision 1351444) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351444 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1081 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1081/)
          Revert r1351163 for fixing the JIRA number; it should be HADOOP-8470 but not HADOOP-8468. (Revision 1351444)

          Result = FAILURE
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351444
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1081 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1081/ ) Revert r1351163 for fixing the JIRA number; it should be HADOOP-8470 but not HADOOP-8468 . (Revision 1351444) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351444 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2387 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2387/)
          Revert r1351163 for fixing the JIRA number; it should be HADOOP-8470 but not HADOOP-8468. (Revision 1351444)

          Result = FAILURE
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351444
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2387 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2387/ ) Revert r1351163 for fixing the JIRA number; it should be HADOOP-8470 but not HADOOP-8468 . (Revision 1351444) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351444 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2438 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2438/)
          Revert r1351163 for fixing the JIRA number; it should be HADOOP-8470 but not HADOOP-8468. (Revision 1351444)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351444
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2438 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2438/ ) Revert r1351163 for fixing the JIRA number; it should be HADOOP-8470 but not HADOOP-8468 . (Revision 1351444) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351444 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2367 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2367/)
          Revert r1351163 for fixing the JIRA number; it should be HADOOP-8470 but not HADOOP-8468. (Revision 1351444)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351444
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2367 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2367/ ) Revert r1351163 for fixing the JIRA number; it should be HADOOP-8470 but not HADOOP-8468 . (Revision 1351444) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351444 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1113 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1113/)
          HADOOP-8468. Add NetworkTopologyWithNodeGroup, a 4-layer implementation of NetworkTopology. Contributed by Junping Du (Revision 1351163)

          Result = FAILURE
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351163
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1113 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1113/ ) HADOOP-8468 . Add NetworkTopologyWithNodeGroup, a 4-layer implementation of NetworkTopology. Contributed by Junping Du (Revision 1351163) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351163 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1080 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1080/)
          HADOOP-8468. Add NetworkTopologyWithNodeGroup, a 4-layer implementation of NetworkTopology. Contributed by Junping Du (Revision 1351163)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351163
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1080 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1080/ ) HADOOP-8468 . Add NetworkTopologyWithNodeGroup, a 4-layer implementation of NetworkTopology. Contributed by Junping Du (Revision 1351163) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351163 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2363 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2363/)
          HADOOP-8468. Add NetworkTopologyWithNodeGroup, a 4-layer implementation of NetworkTopology. Contributed by Junping Du (Revision 1351163)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351163
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2363 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2363/ ) HADOOP-8468 . Add NetworkTopologyWithNodeGroup, a 4-layer implementation of NetworkTopology. Contributed by Junping Du (Revision 1351163) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351163 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2435 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2435/)
          HADOOP-8468. Add NetworkTopologyWithNodeGroup, a 4-layer implementation of NetworkTopology. Contributed by Junping Du (Revision 1351163)

          Result = SUCCESS
          szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351163
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2435 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2435/ ) HADOOP-8468 . Add NetworkTopologyWithNodeGroup, a 4-layer implementation of NetworkTopology. Contributed by Junping Du (Revision 1351163) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1351163 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/NetworkTopologyWithNodeGroup.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/net/TestNetworkTopologyWithNodeGroup.java
          Hide
          Junping Du added a comment -

          Hi Konstantin,
          That's good suggestions. Updated proposal should address most of them. A few comments below:
          > So my motivation with virtual node extension is that it formally inherits the existing policy, but semantically adds a new level of topology.
          Agree. That's what I try to do previously also. The current way is mapping node -> (virtual) node and add "nodegroup" level, so that policy is almost exactly the same: 1st on local (virtual) node, 2nd on off-rack, 3rd on local node of 2nd. The only difference is to make sure 2nd and 3rd are off-nodegroup (and if 1st cannot be local(virtual) node, then can be nodegroup-local node).
          > But from the failure scenarios viewpoint they are bound to the same node, meaning that node failure takes all of them down
          Yes. So adding a node-group level should address the failure relationship between (virtual) nodes perfectly. I think the key points for map current node to vm level include:
          Virtual node (VM) plays as leaf node. There are still failure only happens within VM like daemon failure, os failure, and some physical failure (like: disk failure, as in most cases for running hadoop, VM should mount separated physical disks rather than sharing disk with other VM). So, VM still show some independency even in failure group semantics.
          Virtual node is where JVM is running and java network call happens. In current code base, ip(hostname) of a node (reader, datanode) is used to keep data locality. Only VM-level ip is easy to get by JVM and RPC call so that make sense to represent node ip.
          Thoughts?

          Show
          Junping Du added a comment - Hi Konstantin, That's good suggestions. Updated proposal should address most of them. A few comments below: > So my motivation with virtual node extension is that it formally inherits the existing policy, but semantically adds a new level of topology. Agree. That's what I try to do previously also. The current way is mapping node -> (virtual) node and add "nodegroup" level, so that policy is almost exactly the same: 1st on local (virtual) node, 2nd on off-rack, 3rd on local node of 2nd. The only difference is to make sure 2nd and 3rd are off-nodegroup (and if 1st cannot be local(virtual) node, then can be nodegroup-local node). > But from the failure scenarios viewpoint they are bound to the same node, meaning that node failure takes all of them down Yes. So adding a node-group level should address the failure relationship between (virtual) nodes perfectly. I think the key points for map current node to vm level include: Virtual node (VM) plays as leaf node. There are still failure only happens within VM like daemon failure, os failure, and some physical failure (like: disk failure, as in most cases for running hadoop, VM should mount separated physical disks rather than sharing disk with other VM). So, VM still show some independency even in failure group semantics. Virtual node is where JVM is running and java network call happens. In current code base, ip(hostname) of a node (reader, datanode) is used to keep data locality. Only VM-level ip is easy to get by JVM and RPC call so that make sense to represent node ip. Thoughts?
          Hide
          Junping Du added a comment -

          Update proposal to address Luke and Konstantin's comments:
          + Replica removal policy changes
          + Noting vm placement workaround

          Show
          Junping Du added a comment - Update proposal to address Luke and Konstantin's comments: + Replica removal policy changes + Noting vm placement workaround
          Hide
          Konstantin Shvachko added a comment -

          Sorry, got distracted with the Hadoop event of the week.

          Here is current replication policy.
          0. No more than one replica is placed at any one node
          1. First replica on the local node
          2. Second and third replicas on two different nodes in a different rack
          3. Other replicas on random nodes with restriction that no more than two replicas are placed in the same rack, if there is enough racks.

          With my thinking that the virtual node level is added, the policy remains unchanged. With a single optional clarification:
          (1) First replica on the virtual node then on the local node

          With your approach of adding the hypervisor layer the policy need to be revised, by replacing "node" with "node group".

          So my motivation with virtual node extension is that it formally inherits the existing policy, but semantically adds a new level of topology.

          > Each VM on the same physical machine plays independently

          As you correctly mention in the design doc, topology is about failure scenarios rather than independence of VMs. VM-s are independent as the entities reporting to the NameNode. But from the failure scenarios viewpoint they are bound to the same node, meaning that node failure takes all of them down.
          So the policy should not change, only the implementation of it should.

          > VMs lives on the same physical machine can belong to different logical Hadoop clusters

          Well you can run two DNs or TTs on the same node belonging to different clusters even now, but nobody does that, because operationally it's just too much hassle. Not sure if virtualization will make it different.
          I heard of attempts to run multiple clusters on the same physical nodes for isolation purposes, but didn't hear it was successful.

          Show
          Konstantin Shvachko added a comment - Sorry, got distracted with the Hadoop event of the week. Here is current replication policy. 0. No more than one replica is placed at any one node 1. First replica on the local node 2. Second and third replicas on two different nodes in a different rack 3. Other replicas on random nodes with restriction that no more than two replicas are placed in the same rack, if there is enough racks. With my thinking that the virtual node level is added, the policy remains unchanged. With a single optional clarification: (1) First replica on the virtual node then on the local node With your approach of adding the hypervisor layer the policy need to be revised, by replacing "node" with "node group". So my motivation with virtual node extension is that it formally inherits the existing policy, but semantically adds a new level of topology . > Each VM on the same physical machine plays independently As you correctly mention in the design doc, topology is about failure scenarios rather than independence of VMs. VM-s are independent as the entities reporting to the NameNode. But from the failure scenarios viewpoint they are bound to the same node, meaning that node failure takes all of them down. So the policy should not change, only the implementation of it should. > VMs lives on the same physical machine can belong to different logical Hadoop clusters Well you can run two DNs or TTs on the same node belonging to different clusters even now, but nobody does that, because operationally it's just too much hassle. Not sure if virtualization will make it different. I heard of attempts to run multiple clusters on the same physical nodes for isolation purposes, but didn't hear it was successful.
          Hide
          Junping Du added a comment -

          Hey Konstantin, Thanks for a lot of good suggestions here.
          For 1. In concept, there are two ways to look at the change we proposed. One way is like you said, we add vm level extension to physical host (but make physical host to be innernode, but not leaf any more.). The other way is: we look at VM (virtual node) as previous physical node as container of processes but add an innernode layer between node and rack. We are preferring the second way as following reasons:
          1) Each VM on the same physical machine plays independently in general but have some relations on reliability and lower communication overhead. Each VM has independent hostname, ip and it is the place where hadoop daemons running.
          2) VMs lives on the same physical machine can belong to different logical hadoop clusters, physical host is not like before that can only be dedicated to one logical hadoop cluster but could be shared. Also, physical host's ip and host info (hypervisor's ip and info) should not be aware by hadoop.
          3) In some data locality related policies, VM map to previous physical node well as the first choice to place 1st replica, scheduling task, etc.
          For 2. It's right that VMs on the same host will not share storage directly but could do so (with getting virtual disks) through Hypervisor FS (Like VMFS in VMware vSphere) layer. Another way (should recommend for hadoop case) is to go through RDM (Raw Disk Mapping) configuration in hypervisor that each VM can get some dedicated physical disks. In both cases, the virtual disk drive (and its capacity) for each VM are independent and can be reported by DN without any overlapping.
          For 3. Yes. It looks we are missing replica removal policy in proposal. I will revise it as your suggestion. Thanks!
          For 4. YARN is doing good job in resolving fixed task slot issue that exists in MRv1. Besides resolving this issue in MRv1, it still have some scenarios to run multiple VMs per physical node, like: tenant's task isolation in vm level, separation data node and compute node to support hadoop MapReduce(YARN) cluster auto scale in and out, support standard-customised nodes (as a requirement of cloud) in a heterogeneous hardware environment, etc.
          Thoughts?

          Show
          Junping Du added a comment - Hey Konstantin, Thanks for a lot of good suggestions here. For 1. In concept, there are two ways to look at the change we proposed. One way is like you said, we add vm level extension to physical host (but make physical host to be innernode, but not leaf any more.). The other way is: we look at VM (virtual node) as previous physical node as container of processes but add an innernode layer between node and rack. We are preferring the second way as following reasons: 1) Each VM on the same physical machine plays independently in general but have some relations on reliability and lower communication overhead. Each VM has independent hostname, ip and it is the place where hadoop daemons running. 2) VMs lives on the same physical machine can belong to different logical hadoop clusters, physical host is not like before that can only be dedicated to one logical hadoop cluster but could be shared. Also, physical host's ip and host info (hypervisor's ip and info) should not be aware by hadoop. 3) In some data locality related policies, VM map to previous physical node well as the first choice to place 1st replica, scheduling task, etc. For 2. It's right that VMs on the same host will not share storage directly but could do so (with getting virtual disks) through Hypervisor FS (Like VMFS in VMware vSphere) layer. Another way (should recommend for hadoop case) is to go through RDM (Raw Disk Mapping) configuration in hypervisor that each VM can get some dedicated physical disks. In both cases, the virtual disk drive (and its capacity) for each VM are independent and can be reported by DN without any overlapping. For 3. Yes. It looks we are missing replica removal policy in proposal. I will revise it as your suggestion. Thanks! For 4. YARN is doing good job in resolving fixed task slot issue that exists in MRv1. Besides resolving this issue in MRv1, it still have some scenarios to run multiple VMs per physical node, like: tenant's task isolation in vm level, separation data node and compute node to support hadoop MapReduce(YARN) cluster auto scale in and out, support standard-customised nodes (as a requirement of cloud) in a heterogeneous hardware environment, etc. Thoughts?
          Hide
          Konstantin Shvachko added a comment -

          Junping, I went over the design document. It is pretty comprehensive. A few comments on the design.

          1. Conceptually you are extending current Network Topology by introducing a new layer of leaf nodes. Current topology assumes that physical nodes are the leaves of the hierarchy and you add virtual nodes that can reside on physical nodes. I think this is a more logical way to look at the new topology, rather than saying that you introduce the second layer (node groups) over the nodes, as document does.
          2. The document should clarify how local storage is used by VMs on a physical box. I think the assumption is that VMs never share storage resources. Otherwise there could be a reporting problem. That is, if two VMs share a drive and send two DF reports to the NameNode, then the drive will be counted twice, which can cause problems. I'd recommend to update the pictures and add a section talking about reporting of DNs' resources to NN to make this issue explicitly covered in the design.
          3. For block replication there are 3 policies to consider:
            • block placement policy, when a new block is created
            • block replication policy, when under-replicated blocks are recovered
            • replica removal policy, when replicas are removed for over-replicated blocks
              You covered the first two, and probably need to look into the third as well.
              For the first two I'd be good to write down the entire modified policy rather than just listing the differences.
              And make sure they converge to existing policies if virtual node layer is not defined.
          4. For YARN I am not convinced you will need to run multiple VMs per node, if not for the sake of generosity. It seems YARN should rely on NodeManager to report resources and manage Containers of a node as a whole. Not sure how multiple VMs on a node can help here.
            For MRv1 on the contrary running multiple VMs per node can be useful for modeling variable slots. In this case again the VMs should not share memory otherwise repoting will go wrong.
          Show
          Konstantin Shvachko added a comment - Junping, I went over the design document. It is pretty comprehensive. A few comments on the design. Conceptually you are extending current Network Topology by introducing a new layer of leaf nodes. Current topology assumes that physical nodes are the leaves of the hierarchy and you add virtual nodes that can reside on physical nodes. I think this is a more logical way to look at the new topology, rather than saying that you introduce the second layer (node groups) over the nodes, as document does. The document should clarify how local storage is used by VMs on a physical box. I think the assumption is that VMs never share storage resources. Otherwise there could be a reporting problem. That is, if two VMs share a drive and send two DF reports to the NameNode, then the drive will be counted twice, which can cause problems. I'd recommend to update the pictures and add a section talking about reporting of DNs' resources to NN to make this issue explicitly covered in the design. For block replication there are 3 policies to consider: block placement policy, when a new block is created block replication policy, when under-replicated blocks are recovered replica removal policy, when replicas are removed for over-replicated blocks You covered the first two, and probably need to look into the third as well. For the first two I'd be good to write down the entire modified policy rather than just listing the differences. And make sure they converge to existing policies if virtual node layer is not defined. For YARN I am not convinced you will need to run multiple VMs per node, if not for the sake of generosity. It seems YARN should rely on NodeManager to report resources and manage Containers of a node as a whole. Not sure how multiple VMs on a node can help here. For MRv1 on the contrary running multiple VMs per node can be useful for modeling variable slots. In this case again the VMs should not share memory otherwise repoting will go wrong.
          Hide
          Luke Lu added a comment -

          Yes, noting the new approach and its impact on overall reliability would make the proposal more complete.

          Show
          Luke Lu added a comment - Yes, noting the new approach and its impact on overall reliability would make the proposal more complete.
          Hide
          Junping Du added a comment -

          Hi Luke,
          Yes. I agree with you that when number of nodes of logical cluster is much smaller than number of (available) physical hosts, it is good to do such placement for reliability if infrastructure allows (although may trade off a bit on more network traffic across rack/core switch. Isn't it?). Are noting this approach in proposal and describing its use scenario good enough to go for proposal?

          Thanks,

          Junping

          Show
          Junping Du added a comment - Hi Luke, Yes. I agree with you that when number of nodes of logical cluster is much smaller than number of (available) physical hosts, it is good to do such placement for reliability if infrastructure allows (although may trade off a bit on more network traffic across rack/core switch. Isn't it?). Are noting this approach in proposal and describing its use scenario good enough to go for proposal? Thanks, Junping
          Hide
          Luke Lu added a comment -

          Actually, the two approaches are orthogonal. Avoiding placing more than one data node of the same logical cluster on the same physical host will increase reliability even if the new topology algorithm is in place.

          VM placement is only NP hard if instance configuration is arbitrary and that you require absolute optimal placement. It's easier if the number of instance types is limited a la AWS. I suspect that greedy algorithms exist to approximate the optimal replacement. We don't need millisecond response time for such placement algorithm either, which is only done once at the logical cluster deploy time and when there are physical host failures.

          It's definitely easier to do such placement when number of nodes of a logical cluster is much smaller than the number of physical hosts, which is the case for AWS and SmartCloud.

          Show
          Luke Lu added a comment - Actually, the two approaches are orthogonal. Avoiding placing more than one data node of the same logical cluster on the same physical host will increase reliability even if the new topology algorithm is in place. VM placement is only NP hard if instance configuration is arbitrary and that you require absolute optimal placement. It's easier if the number of instance types is limited a la AWS. I suspect that greedy algorithms exist to approximate the optimal replacement. We don't need millisecond response time for such placement algorithm either, which is only done once at the logical cluster deploy time and when there are physical host failures. It's definitely easier to do such placement when number of nodes of a logical cluster is much smaller than the number of physical hosts, which is the case for AWS and SmartCloud.
          Hide
          Junping Du added a comment -

          I would update proposal a bit with listing the first approach. This is a workaround without hadoop code change. However, this "1-1 mapping" of data node to physical host will take following restrictions:
          1. If nodes' number is larger than the number of physical host.
          2. If the number of nodes is smaller than physical hosts, but some hosts are fully occupied by other logical hadoop clusters or other applications.
          3. The clouds/datacenters are formed of heterogeneous hosts that some hosts are not suitable to deploy hadoop nodes. i.e. attached to shared storage only.
          In general, VM placement in cloud is a complex BIN-packing problem which is NP-hard and should be optimised for a balance of resource utilization and reliability. Applying an absolute rule like the first approach is not the best way. In addition, the principle of hadoop network topology should reflect the physical(or virtual) topology in the bottom layer but should not take strict requirements/restriction to deploying topology.
          Thoughts?

          Show
          Junping Du added a comment - I would update proposal a bit with listing the first approach. This is a workaround without hadoop code change. However, this "1-1 mapping" of data node to physical host will take following restrictions: 1. If nodes' number is larger than the number of physical host. 2. If the number of nodes is smaller than physical hosts, but some hosts are fully occupied by other logical hadoop clusters or other applications. 3. The clouds/datacenters are formed of heterogeneous hosts that some hosts are not suitable to deploy hadoop nodes. i.e. attached to shared storage only. In general, VM placement in cloud is a complex BIN-packing problem which is NP-hard and should be optimised for a balance of resource utilization and reliability. Applying an absolute rule like the first approach is not the best way. In addition, the principle of hadoop network topology should reflect the physical(or virtual) topology in the bottom layer but should not take strict requirements/restriction to deploying topology. Thoughts?
          Hide
          Junping Du added a comment -

          Hi Luke,
          Thanks for good comments. Will address this soon.

          Best,

          Junping

          Show
          Junping Du added a comment - Hi Luke, Thanks for good comments. Will address this soon. Best, Junping
          Hide
          Luke Lu added a comment -

          This is a comment on the proposal, IMO, is missing a viable option. There are essentially two approaches to address the problem.

          1. Enhance VM placement to ensure 1-1 mapping of data node to physical host within a logical hadoop cluster. This approach doesn't require any modification to Hadoop to achieve the same data reliability/redundancy. This can be a viable option for Hadoop clusters with number of nodes smaller than number of physical hosts, e.g, large public or company wide clouds.
          2. For Hadoop clusters with more data nodes than the physical host. The analysis in the proposal is spot on and the extra layer is required to achieve optimum data reliability.
          Show
          Luke Lu added a comment - This is a comment on the proposal, IMO, is missing a viable option. There are essentially two approaches to address the problem. Enhance VM placement to ensure 1-1 mapping of data node to physical host within a logical hadoop cluster. This approach doesn't require any modification to Hadoop to achieve the same data reliability/redundancy. This can be a viable option for Hadoop clusters with number of nodes smaller than number of physical hosts, e.g, large public or company wide clouds. For Hadoop clusters with more data nodes than the physical host. The analysis in the proposal is spot on and the extra layer is required to achieve optimum data reliability.
          Hide
          Junping Du added a comment -

          Finished! The code separation following JIRA will happen tomorrow.

          Show
          Junping Du added a comment - Finished! The code separation following JIRA will happen tomorrow.
          Hide
          Robert Joseph Evans added a comment -

          Having the other links like you have done is usually good enough.

          Show
          Robert Joseph Evans added a comment - Having the other links like you have done is usually good enough.
          Hide
          Junping Du added a comment -

          It looks like this move action will actually move sub jira out of parent jira (umbrella). Do we need three parent JIRAs in Common/HDFS/MapReduce?
          To your questions on running hadoop inside VMs, I don't have a concrete number for now. But we know some enterprise customer would like to run hadoop cluster in their virtualized datacenter/private cloud.

          Show
          Junping Du added a comment - It looks like this move action will actually move sub jira out of parent jira (umbrella). Do we need three parent JIRAs in Common/HDFS/MapReduce? To your questions on running hadoop inside VMs, I don't have a concrete number for now. But we know some enterprise customer would like to run hadoop cluster in their virtualized datacenter/private cloud.
          Hide
          Robert Joseph Evans added a comment -

          You can move the JIRAs. More Actions -> Move. If it is possible to split them up, it is nice to keep them separate, but it is not totally necessary. If they do span multiple projects and are hard to split up you can leave them under HADOOP. The main reason for this is that some people only watch the HDFS lists, while others only look at the MAPREDUCE lists, and may miss changes that are not filed under the appropriate group.

          I am interested to see where this goes, and it seems very logical to me to be able to express to Hadoop what your topology really does look like. I am not sure how many groups are running Hadoop inside VMs except perhaps on EC2, but I have a very limited view into that right now.

          Show
          Robert Joseph Evans added a comment - You can move the JIRAs. More Actions -> Move. If it is possible to split them up, it is nice to keep them separate, but it is not totally necessary. If they do span multiple projects and are hard to split up you can leave them under HADOOP. The main reason for this is that some people only watch the HDFS lists, while others only look at the MAPREDUCE lists, and may miss changes that are not filed under the appropriate group. I am interested to see where this goes, and it seems very logical to me to be able to express to Hadoop what your topology really does look like. I am not sure how many groups are running Hadoop inside VMs except perhaps on EC2, but I have a very limited view into that right now.
          Hide
          Junping Du added a comment -

          Hi Robert,
          Thanks for your reply. So you are suggesting re-create sub tasks in proper project(Common, HDFS, MAPREDUCE). Isn't it?
          For patch mixed with cross projects (like 1st sub jira, mix COMMON and HDFS), we should create both a common and hdfs project for it?

          Best,

          Junping

          Show
          Junping Du added a comment - Hi Robert, Thanks for your reply. So you are suggesting re-create sub tasks in proper project(Common, HDFS, MAPREDUCE). Isn't it? For patch mixed with cross projects (like 1st sub jira, mix COMMON and HDFS), we should create both a common and hdfs project for it? Best, Junping
          Hide
          Robert Joseph Evans added a comment -

          Junping Du,

          I have been looking at some of your patches, but there is a lot here to go through and it is likely to take some time.

          Could you please move your JIRAs to the appropriate project. HDFS JIRAs should be moved out of HADOOP and into HDFS, Mapreduce should go to MAPREDUCE, and only the ones that stay in HADOOP should be for code that goes under the hadoop-common-project directory.

          Thanks

          Show
          Robert Joseph Evans added a comment - Junping Du, I have been looking at some of your patches, but there is a lot here to go through and it is likely to take some time. Could you please move your JIRAs to the appropriate project. HDFS JIRAs should be moved out of HADOOP and into HDFS, Mapreduce should go to MAPREDUCE, and only the ones that stay in HADOOP should be for code that goes under the hadoop-common-project directory. Thanks
          Hide
          Junping Du added a comment -

          I mark P1,P3 and P6 as patch available, P2, P4, P5, P7 as dependency on P1, P3 and P6, cannot pass the build on current trunk.

          Show
          Junping Du added a comment - I mark P1,P3 and P6 as patch available, P2, P4, P5, P7 as dependency on P1, P3 and P6, cannot pass the build on current trunk.
          Hide
          Junping Du added a comment -

          Patch is divided into 7 patches and attached to each sub tasks. There are some dependencies between patches and only three patches are independent patches: P1, P3 and P6.

          Show
          Junping Du added a comment - Patch is divided into 7 patches and attached to each sub tasks. There are some dependencies between patches and only three patches are independent patches: P1, P3 and P6.
          Hide
          Junping Du added a comment -

          This is a patch with all code changes. We will divide this into 7 sub-patches for easily review and check in.

          Show
          Junping Du added a comment - This is a patch with all code changes. We will divide this into 7 sub-patches for easily review and check in.

            People

            • Assignee:
              Junping Du
              Reporter:
              Junping Du
            • Votes:
              7 Vote for this issue
              Watchers:
              58 Start watching this issue

              Dates

              • Created:
                Updated:

                Development