Details

    • Hadoop Flags:
      Reviewed

      Description

      HDFS-6671 changes replication monitor to use block storage policy for replication. It should also use the fallback storage types when a particular type of storage is full.

      1. h6686_20140723.patch
        8 kB
        Tsz Wo Nicholas Sze
      2. h6686_20140721c.patch
        8 kB
        Tsz Wo Nicholas Sze
      3. h6686_20140721.patch
        8 kB
        Tsz Wo Nicholas Sze

        Activity

        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #1876 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1876/)
        HDFS-6686. Change BlockPlacementPolicy to use fallback when some storage types are unavailable. (szetszwo: rev ac5e8aed7ca1e9493f96f8795d0caafd5282b9a7)

        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1876 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1876/ ) HDFS-6686 . Change BlockPlacementPolicy to use fallback when some storage types are unavailable. (szetszwo: rev ac5e8aed7ca1e9493f96f8795d0caafd5282b9a7) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #1901 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1901/)
        HDFS-6686. Change BlockPlacementPolicy to use fallback when some storage types are unavailable. (szetszwo: rev ac5e8aed7ca1e9493f96f8795d0caafd5282b9a7)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1901 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1901/ ) HDFS-6686 . Change BlockPlacementPolicy to use fallback when some storage types are unavailable. (szetszwo: rev ac5e8aed7ca1e9493f96f8795d0caafd5282b9a7) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Yarn-trunk #685 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/685/)
        HDFS-6686. Change BlockPlacementPolicy to use fallback when some storage types are unavailable. (szetszwo: rev ac5e8aed7ca1e9493f96f8795d0caafd5282b9a7)

        • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
        • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Show
        Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #685 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/685/ ) HDFS-6686 . Change BlockPlacementPolicy to use fallback when some storage types are unavailable. (szetszwo: rev ac5e8aed7ca1e9493f96f8795d0caafd5282b9a7) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I have committed this.

        Show
        Tsz Wo Nicholas Sze added a comment - I have committed this.
        Hide
        Vinayakumar B added a comment -

        +1 for Latest patch, lgtm.
        Thanks Tsz Wo Nicholas Sze.

        Show
        Vinayakumar B added a comment - +1 for Latest patch, lgtm. Thanks Tsz Wo Nicholas Sze .
        Hide
        Tsz Wo Nicholas Sze added a comment -

        h6686_20140723.patch: incorporated the comments from Vinay and Arpit.

        Show
        Tsz Wo Nicholas Sze added a comment - h6686_20140723.patch: incorporated the comments from Vinay and Arpit.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Vinay,

        1. Good catch. We should pass excludedNodes.
        2. For getAdditionalDatanode(), since the write is already started, there are already some data in the block. So using replication fallback is correct. Creation fallback usually is a subset of replication fallback since if there are not enough storages available, we may fails creation. However, if we fail replication, it may result in data loss.

        Arpit,

        1. Good catch. We need to pass newBlock as a parameter.
        2. numOfResults indeed is the number of existing replicas (numOfResults is used to determine whether local host/local rack/remote rack should be chosen from). numOfReplicas (call it n) is the number of replicas to be chosen. In our case, we try to select n storage types but we may only able to get m < n, where m = storageTypes.size(). So we should update numOfReplicas.

        Thanks both of you for the careful reviews!

        Show
        Tsz Wo Nicholas Sze added a comment - Vinay, Good catch. We should pass excludedNodes. For getAdditionalDatanode(), since the write is already started, there are already some data in the block. So using replication fallback is correct. Creation fallback usually is a subset of replication fallback since if there are not enough storages available, we may fails creation. However, if we fail replication, it may result in data loss. Arpit, Good catch. We need to pass newBlock as a parameter. numOfResults indeed is the number of existing replicas (numOfResults is used to determine whether local host/local rack/remote rack should be chosen from). numOfReplicas (call it n) is the number of replicas to be chosen. In our case, we try to select n storage types but we may only able to get m < n, where m = storageTypes.size(). So we should update numOfReplicas. Thanks both of you for the careful reviews!
        Hide
        Arpit Agarwal added a comment -

        Hi Nicholas, took a look at the patch. Also have two questions:

        1. The recursive call on line 393 passes the same results list so the callee might choose the replication fallback instead of creation fallback.
        2. Perhaps I misunderstood - on line 318, should we update numOfResults instead of numOfReplicas? Later we check numOfResults before dequeuing entries from storageTypes.
        Show
        Arpit Agarwal added a comment - Hi Nicholas, took a look at the patch. Also have two questions: The recursive call on line 393 passes the same results list so the callee might choose the replication fallback instead of creation fallback. Perhaps I misunderstood - on line 318, should we update numOfResults instead of numOfReplicas? Later we check numOfResults before dequeuing entries from storageTypes.
        Hide
        Vinayakumar B added a comment -

        I have 2 doubts

        1. In the below code, for the retry, whether oldExcludedNodes should be passed or excludedNodes should be passed..? since avoidStaleNodes is passed as false.?
        By default, avoidStaleNodes is false, hence oldExcludedNodes will be null, so I feel, original excludedNodes will be missed and it can choose already excluded node again.. no?

              if (storageTypes.size() > 0) {
                // Retry chooseTarget with fallback storage types
                unavailableStorages.add(curStorageType);
                return chooseTarget(numOfReplicas, writer, oldExcludedNodes, blocksize,
                    maxNodesPerRack, results, false, storagePolicy, unavailableStorages);
              }

        2. For the PipelineRecovery (getAdditionalDatanode()) calls also, it uses replicationFallback storageType, whether this is Ok?

        Show
        Vinayakumar B added a comment - I have 2 doubts 1. In the below code, for the retry, whether oldExcludedNodes should be passed or excludedNodes should be passed..? since avoidStaleNodes is passed as false.? By default, avoidStaleNodes is false, hence oldExcludedNodes will be null, so I feel, original excludedNodes will be missed and it can choose already excluded node again.. no? if (storageTypes.size() > 0) { // Retry chooseTarget with fallback storage types unavailableStorages.add(curStorageType); return chooseTarget(numOfReplicas, writer, oldExcludedNodes, blocksize, maxNodesPerRack, results, false , storagePolicy, unavailableStorages); } 2. For the PipelineRecovery (getAdditionalDatanode()) calls also, it uses replicationFallback storageType, whether this is Ok?
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Oops, updated a wrong file previously. It should be

        h6686_20140721c.patch

        Show
        Tsz Wo Nicholas Sze added a comment - Oops, updated a wrong file previously. It should be h6686_20140721c.patch
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Vinay, you are right that the patch depends on HDFS-6710 which is now committed. Thanks for trying the patch.

        h6710_20140721.patch: updates with the branch.

        Show
        Tsz Wo Nicholas Sze added a comment - Vinay, you are right that the patch depends on HDFS-6710 which is now committed. Thanks for trying the patch. h6710_20140721.patch: updates with the branch.
        Hide
        Vinayakumar B added a comment -

        This patch doesn't apply properly. I think this depends on HDFS-6710.

        Show
        Vinayakumar B added a comment - This patch doesn't apply properly. I think this depends on HDFS-6710 .
        Hide
        Tsz Wo Nicholas Sze added a comment -

        h6686_20140721.patch: use fallbacks if the required storage type is unavailable.

        Show
        Tsz Wo Nicholas Sze added a comment - h6686_20140721.patch: use fallbacks if the required storage type is unavailable.

          People

          • Assignee:
            Tsz Wo Nicholas Sze
            Reporter:
            Tsz Wo Nicholas Sze
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development