Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-6133

Make Balancer support exclude specified path

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.7.0
    • Component/s: balancer & mover, datanode
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Add a feature for replica pinning so that when a replica is pinned in a datanode, it will not be moved by Balancer/Mover. The replica pinning feature can be enabled/disabled by "dfs.datanode.block-pinning.enabled", where the default is false.

      Description

      Currently, run Balancer will destroying Regionserver's data locality.
      If getBlocks could exclude blocks belongs to files which have specific path prefix, like "/hbase", then we can run Balancer without destroying Regionserver's data locality.

      1. HDFS-6133.patch
        24 kB
        zhaoyunjiong
      2. HDFS-6133-1.patch
        29 kB
        zhaoyunjiong
      3. HDFS-6133-10.patch
        33 kB
        zhaoyunjiong
      4. HDFS-6133-11.patch
        33 kB
        zhaoyunjiong
      5. HDFS-6133-2.patch
        29 kB
        zhaoyunjiong
      6. HDFS-6133-3.patch
        22 kB
        zhaoyunjiong
      7. HDFS-6133-4.patch
        23 kB
        zhaoyunjiong
      8. HDFS-6133-5.patch
        23 kB
        zhaoyunjiong
      9. HDFS-6133-6.patch
        24 kB
        zhaoyunjiong
      10. HDFS-6133-7.patch
        30 kB
        zhaoyunjiong
      11. HDFS-6133-8.patch
        30 kB
        zhaoyunjiong
      12. HDFS-6133-9.patch
        37 kB
        zhaoyunjiong

        Issue Links

          Activity

          Show
          zhaoyunjiong added a comment - At the first, my idea is very similar with HDFS-4420 . The main difference is I change getBlocks to exclude blocks belongs to some path. https://issues.apache.org/jira/browse/HDFS-6133?focusedCommentId=13943050&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13943050 After Daryn Sharp's comments, it changed to pin: https://issues.apache.org/jira/browse/HDFS-6133?focusedCommentId=13980504&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13980504
          Hide
          Yongjun Zhang added a comment -

          Thanks a lot Nicholas!

          Show
          Yongjun Zhang added a comment - Thanks a lot Nicholas!
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Sure, just have filed HDFS-7849 for updating the documentation.

          Show
          Tsz Wo Nicholas Sze added a comment - Sure, just have filed HDFS-7849 for updating the documentation.
          Hide
          Yongjun Zhang added a comment -

          Hi Tsz Wo Nicholas Sze,

          Thanks for your explanation, and sorry for late reply.

          I agree with your assessment. I wonder if we can update the config property description to say that enabling is not recommended before rolling upgrade is finished?

          Thanks.

          Show
          Yongjun Zhang added a comment - Hi Tsz Wo Nicholas Sze , Thanks for your explanation, and sorry for late reply. I agree with your assessment. I wonder if we can update the config property description to say that enabling is not recommended before rolling upgrade is finished? Thanks.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          By incompatible, we mean that the old software does work at all or the old features do not work correctly with the new software. For the new features in the new software, it is impossible to work correctly with the old software.

          In our case, if block pinning is enabled during rolling upgrade, everything works fine except the block pinning features. So I think it is not an incompatible change. What do you think?

          Show
          Tsz Wo Nicholas Sze added a comment - By incompatible, we mean that the old software does work at all or the old features do not work correctly with the new software. For the new features in the new software, it is impossible to work correctly with the old software. In our case, if block pinning is enabled during rolling upgrade, everything works fine except the block pinning features. So I think it is not an incompatible change. What do you think?
          Hide
          Yongjun Zhang added a comment -

          Hi Nicholas,

          Assuming that user doesn't change config during the upgrade process, since by default pinning is disabled, I guess your suggestion may be fine. Would you please comment on whether the assumption that user doesn't change config is a valid assumption? It seems we don't have a hard control on that, which is what I was a little worried. Maybe we can change the config description to emphasize this rule?

          Thanks.

          Show
          Yongjun Zhang added a comment - Hi Nicholas, Assuming that user doesn't change config during the upgrade process, since by default pinning is disabled, I guess your suggestion may be fine. Would you please comment on whether the assumption that user doesn't change config is a valid assumption? It seems we don't have a hard control on that, which is what I was a little worried. Maybe we can change the config description to emphasize this rule? Thanks.
          Hide
          Yongjun Zhang added a comment -

          Hi Nicholas,

          Thanks a lot for your quick reply!

          I think that means we need to change the rolling upgrade steps, which still means a soft incompatible. Should we mark this jira as incompatible?

          Show
          Yongjun Zhang added a comment - Hi Nicholas, Thanks a lot for your quick reply! I think that means we need to change the rolling upgrade steps, which still means a soft incompatible. Should we mark this jira as incompatible?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > ... I was wondering whether there is any compatibility issue when we do rolling upgrade? ...

          For rolling upgrade, the block pinning feature should be first disabled, then upgrade the cluster and then enable the feature. What do you think?

          Show
          Tsz Wo Nicholas Sze added a comment - > ... I was wondering whether there is any compatibility issue when we do rolling upgrade? ... For rolling upgrade, the block pinning feature should be first disabled, then upgrade the cluster and then enable the feature. What do you think?
          Hide
          Yongjun Zhang added a comment -

          HI Colin Patrick McCabe,

          My understanding is that HDFS-4420's approach is a passive "pinning": it tells balancer not to move the blocks belonging to a specified file path; on the other hand, HDFS-6133 proactively pin the blocks at write time by specifying favored nodes. I think if we do passive "pinning" after the the blocks of a file have been moved by balancer, then it may not be preferred behavior. Hi zhaoyunjiong, would you please share your thoughts when you started with this approach when HDFS-4420's patch was already there?

          Hi Tsz Wo Nicholas Sze and zhaoyunjiong, about HDFS-6133, I was wondering whether there is any compatibility issue when we do rolling upgrade? There are API changes at DataTansferProtocol#writeBlock, what would happen if DNs with HDFS-6133 (trying to pin) talks to DNs without the fix? Would you please comment?

          Thanks.

          Show
          Yongjun Zhang added a comment - HI Colin Patrick McCabe , My understanding is that HDFS-4420 's approach is a passive "pinning": it tells balancer not to move the blocks belonging to a specified file path; on the other hand, HDFS-6133 proactively pin the blocks at write time by specifying favored nodes. I think if we do passive "pinning" after the the blocks of a file have been moved by balancer, then it may not be preferred behavior. Hi zhaoyunjiong , would you please share your thoughts when you started with this approach when HDFS-4420 's patch was already there? Hi Tsz Wo Nicholas Sze and zhaoyunjiong , about HDFS-6133 , I was wondering whether there is any compatibility issue when we do rolling upgrade? There are API changes at DataTansferProtocol#writeBlock , what would happen if DNs with HDFS-6133 (trying to pin) talks to DNs without the fix? Would you please comment? Thanks.
          Hide
          Colin Patrick McCabe added a comment -

          I apologize for coming into this discussion really late. I am curious, though, why did we decide to go with pinning rather than with the solution on HDFS-4420? That solution seems simpler because it doesn't require modifying the datanode. Instead, we could just tell the balancer to exclude the HBase paths. Do you guys have some insight into this?

          Show
          Colin Patrick McCabe added a comment - I apologize for coming into this discussion really late. I am curious, though, why did we decide to go with pinning rather than with the solution on HDFS-4420 ? That solution seems simpler because it doesn't require modifying the datanode. Instead, we could just tell the balancer to exclude the HBase paths. Do you guys have some insight into this?
          Hide
          Yongjun Zhang added a comment -

          Hi Tsz Wo Nicholas Sze,

          Thanks for your comment about the possible performance optimization, it's amazing that you even tried a test program, thanks for doing it! Yes, your result is pretty good. I agree that we don't have to do the optimization for now. I was just thinking about avoiding unnecessary computation if possible, especially for busy node.

          BTW,

          In many clusters, I heard that on average a file only has 1.x blocks.

          I'm surprised to hear this statistics but good to know. I always had the impression that we deal with relatively small number of big files, rather than many small files.

          Thanks.

          Show
          Yongjun Zhang added a comment - Hi Tsz Wo Nicholas Sze , Thanks for your comment about the possible performance optimization, it's amazing that you even tried a test program, thanks for doing it! Yes, your result is pretty good. I agree that we don't have to do the optimization for now. I was just thinking about avoiding unnecessary computation if possible, especially for busy node. BTW, In many clusters, I heard that on average a file only has 1.x blocks. I'm surprised to hear this statistics but good to know. I always had the impression that we deal with relatively small number of big files, rather than many small files. Thanks.
          Hide
          Jingcheng Du added a comment -

          Thanks, it was a wrong patch.

          Show
          Jingcheng Du added a comment - Thanks, it was a wrong patch.
          Hide
          Jingcheng Du added a comment -

          Thanks, it was a wrong patch.

          Show
          Jingcheng Du added a comment - Thanks, it was a wrong patch.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > ... the "dfs.datanode.block-pinning.enabled" is not referenced by any code, ...

          Looked at the wrong patch? DFSConfigKeys.DFS_DATANODE_BLOCK_PINNING_ENABLED is used in FsDatasetImpl. When blockPinningEnabled is false, FsDatasetImpl.getPinning always return false and setPinning is an no-op.

          Show
          Tsz Wo Nicholas Sze added a comment - > ... the "dfs.datanode.block-pinning.enabled" is not referenced by any code, ... Looked at the wrong patch? DFSConfigKeys.DFS_DATANODE_BLOCK_PINNING_ENABLED is used in FsDatasetImpl. When blockPinningEnabled is false, FsDatasetImpl.getPinning always return false and setPinning is an no-op.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > ... Assume on average a file has 1k blocks, ...

          In many clusters, I heard that on average a file only has 1.x blocks.

          > ... the diff is 40k vs 10M. ...

          This is on top of the 1000 favoredNodes and 10 replicas assumption. So, the case is quite extreme.

          Suppose we hit this extreme case, the io time used in writing the 1000 blocks file is significantly larger than the time used for 10M string comparisons. Just have tested it in my laptop. It takes less than 100 ms for 10M string comparisons. See my test program below.

            public static void main(String[] args) {
              final Random ran = new Random();
              final String prefix = ran.nextInt(256) + "." + ran.nextInt(256)
                  + "." + ran.nextInt(256) + "." + ran.nextInt(256) + ":";
          
              String[] a = new String[1000];
              String[] b = new String[1000];
              for(int i = 0; i < a.length; i++) {
                a[i] = prefix + ran.nextInt(1000);
                b[i] = prefix + ran.nextInt(1000);
              }
              int same = 0;
              int different = 0;
              final long starttime = System.currentTimeMillis();
              for(int k = 0; k < 10; k++) {
                for(int i = 0; i < a.length; i++) {
                  for(int j = 0; j < b.length; j++) {
                    if (a[i].equals(b[j])) {
                      same++;
                    } else {
                      different++;
                    }
                  }
                }
              }
              final long duration = System.currentTimeMillis() - starttime;
              System.out.println("duration=" + duration + " ms, same=" + same + ", different=" + different);
              //Sample output:
              //duration=73 ms, same=9830, different=9990170
            }
          
          Show
          Tsz Wo Nicholas Sze added a comment - > ... Assume on average a file has 1k blocks, ... In many clusters, I heard that on average a file only has 1.x blocks. > ... the diff is 40k vs 10M. ... This is on top of the 1000 favoredNodes and 10 replicas assumption. So, the case is quite extreme. Suppose we hit this extreme case, the io time used in writing the 1000 blocks file is significantly larger than the time used for 10M string comparisons. Just have tested it in my laptop. It takes less than 100 ms for 10M string comparisons. See my test program below. public static void main( String [] args) { final Random ran = new Random(); final String prefix = ran.nextInt(256) + "." + ran.nextInt(256) + "." + ran.nextInt(256) + "." + ran.nextInt(256) + ":" ; String [] a = new String [1000]; String [] b = new String [1000]; for ( int i = 0; i < a.length; i++) { a[i] = prefix + ran.nextInt(1000); b[i] = prefix + ran.nextInt(1000); } int same = 0; int different = 0; final long starttime = System .currentTimeMillis(); for ( int k = 0; k < 10; k++) { for ( int i = 0; i < a.length; i++) { for ( int j = 0; j < b.length; j++) { if (a[i].equals(b[j])) { same++; } else { different++; } } } } final long duration = System .currentTimeMillis() - starttime; System .out.println( "duration=" + duration + " ms, same=" + same + ", different=" + different); //Sample output: //duration=73 ms, same=9830, different=9990170 }
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #97 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/97/)
          HDFS-6133. Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover. Contributed by zhaoyunjiong (szetszwo: rev 085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #97 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/97/ ) HDFS-6133 . Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover. Contributed by zhaoyunjiong (szetszwo: rev 085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          Hide
          Jingcheng Du added a comment -

          Hi zhaoyunjiong.
          The release note mentions that "dfs.datanode.block-pinning.enabled" is the key to enable/disable the pinning. As reading the patch, I find that it only uses the favoredNodes (the first element) to decide if the pinning is enabled, and the "dfs.datanode.block-pinning.enabled" is not referenced by any code, i.e. it's not useful in this feature. Am I wrong? Please advise. Thanks a lot.

          Show
          Jingcheng Du added a comment - Hi zhaoyunjiong . The release note mentions that "dfs.datanode.block-pinning.enabled" is the key to enable/disable the pinning. As reading the patch, I find that it only uses the favoredNodes (the first element) to decide if the pinning is enabled, and the "dfs.datanode.block-pinning.enabled" is not referenced by any code, i.e. it's not useful in this feature. Am I wrong? Please advise. Thanks a lot.
          Hide
          Yongjun Zhang added a comment -

          Another factor is how many blocks a file have, since the computation happens once for each block of the file. Assume on average a file has 1k blocks, then the computation is 10M to write one file. If we construct a hash set, the computation complexiity is 40k. So the diff is 40k vs 10M.

          I think we should consider optimizing. Hi Tsz Wo Nicholas Sze, what do you think?

          Thanks.

          Show
          Yongjun Zhang added a comment - Another factor is how many blocks a file have, since the computation happens once for each block of the file. Assume on average a file has 1k blocks, then the computation is 10M to write one file. If we construct a hash set, the computation complexiity is 40k. So the diff is 40k vs 10M. I think we should consider optimizing. Hi Tsz Wo Nicholas Sze , what do you think? Thanks.
          Hide
          zhaoyunjiong added a comment -

          Is there use scenario that one want to specify larger number of favoredNodes?

          It might have.
          Let's say if user set 1000 favoredNodes, and 10 replicates, DFSOutputStream.getPinnings will do 10,000 compare at worst case.
          Seems not so bad.
          Do you think we need optimize the code?

          Show
          zhaoyunjiong added a comment - Is there use scenario that one want to specify larger number of favoredNodes? It might have. Let's say if user set 1000 favoredNodes, and 10 replicates, DFSOutputStream.getPinnings will do 10,000 compare at worst case. Seems not so bad. Do you think we need optimize the code?
          Hide
          Yongjun Zhang added a comment -

          Hi zhaoyunjiong,

          Sorry, which question? I clicked the URL, but don't show the question clearly.

          A couple of days ago, I replied your comment as below (if you roll up to find my comment you will see):

          Are you saying NumFavouredNodes is equivalent to replication factor? I thought they are different: the favoredNodes is specified when constructing DFSOutputStream, as a collection of DNs. The write pipeline of a data block usually has three replicas, if any of the replicas happens to be in the collection, and the data is pinned there.
          I guess what you meant is that the normal use cases are, when we construct DFSOutputStream for a file, we usually only pass replication number of DNs as favored DNs? Is there use scenario that one want to specify larger number of favoredNodes?

          Thanks.

          Show
          Yongjun Zhang added a comment - Hi zhaoyunjiong , Sorry, which question? I clicked the URL, but don't show the question clearly. A couple of days ago, I replied your comment as below (if you roll up to find my comment you will see): Are you saying NumFavouredNodes is equivalent to replication factor? I thought they are different: the favoredNodes is specified when constructing DFSOutputStream, as a collection of DNs. The write pipeline of a data block usually has three replicas, if any of the replicas happens to be in the collection, and the data is pinned there. I guess what you meant is that the normal use cases are, when we construct DFSOutputStream for a file, we usually only pass replication number of DNs as favored DNs? Is there use scenario that one want to specify larger number of favoredNodes? Thanks.
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #2053 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2053/)
          HDFS-6133. Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover. Contributed by zhaoyunjiong (szetszwo: rev 085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2053 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2053/ ) HDFS-6133 . Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover. Contributed by zhaoyunjiong (szetszwo: rev 085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #103 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/103/)
          HDFS-6133. Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover. Contributed by zhaoyunjiong (szetszwo: rev 085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #103 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/103/ ) HDFS-6133 . Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover. Contributed by zhaoyunjiong (szetszwo: rev 085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #2034 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2034/)
          HDFS-6133. Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover. Contributed by zhaoyunjiong (szetszwo: rev 085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2034 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2034/ ) HDFS-6133 . Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover. Contributed by zhaoyunjiong (szetszwo: rev 085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #836 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/836/)
          HDFS-6133. Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover. Contributed by zhaoyunjiong (szetszwo: rev 085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #836 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/836/ ) HDFS-6133 . Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover. Contributed by zhaoyunjiong (szetszwo: rev 085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #102 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/102/)
          HDFS-6133. Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover. Contributed by zhaoyunjiong (szetszwo: rev 085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #102 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/102/ ) HDFS-6133 . Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover. Contributed by zhaoyunjiong (szetszwo: rev 085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
          Hide
          zhaoyunjiong added a comment -

          Sorry, which question? I clicked the URL, but don't show the question clearly.

          Show
          zhaoyunjiong added a comment - Sorry, which question? I clicked the URL, but don't show the question clearly.
          Hide
          Yongjun Zhang added a comment -

          Hi Ted Yu,

          When targetPinnings is not null, whether to pin is determined by targetPinnings[0] which is a boolean.

          Show
          Yongjun Zhang added a comment - Hi Ted Yu , When targetPinnings is not null, whether to pin is determined by targetPinnings [0] which is a boolean.
          Hide
          Ted Yu added a comment -

          In DFSOutputStream.java:

          1452	            (targetPinnings == null ? false : targetPinnings[0]), targetPinnings);
          

          Looks like the boolean parameter, pinning, is unnecessary: when targetPinnings is not null, pinning would be true. Otherwise pinning is false.

          Show
          Ted Yu added a comment - In DFSOutputStream.java: 1452 (targetPinnings == null ? false : targetPinnings[0]), targetPinnings); Looks like the boolean parameter, pinning, is unnecessary: when targetPinnings is not null, pinning would be true. Otherwise pinning is false.
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #7080 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7080/)
          HDFS-6133. Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover. Contributed by zhaoyunjiong (szetszwo: rev 085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7080 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7080/ ) HDFS-6133 . Add a feature for replica pinning so that a pinned replica will not be moved by Balancer/Mover. Contributed by zhaoyunjiong (szetszwo: rev 085b1e293ff53f7a86aa21406cfd4bfa0f3bf33b) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Receiver.java hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/Sender.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferProtocol.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
          Show
          Yongjun Zhang added a comment - Thanks Nicholas for committing this. Hi zhaoyunjiong , Would you please answer the questions in https://issues.apache.org/jira/browse/HDFS-6133?focusedCommentId=14314368&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14314368 ? Thanks.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I have committed this. Thanks, zhaoyunjiong!

          Show
          Tsz Wo Nicholas Sze added a comment - I have committed this. Thanks, zhaoyunjiong!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12697963/HDFS-6133-11.patch
          against trunk revision 7c6b654.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9530//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9530//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9530//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697963/HDFS-6133-11.patch against trunk revision 7c6b654. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9530//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9530//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9530//console This message is automatically generated.
          Hide
          zhaoyunjiong added a comment -

          Update patch to solve merge conflict in hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java.

          Show
          zhaoyunjiong added a comment - Update patch to solve merge conflict in hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12697628/HDFS-6133-10.patch
          against trunk revision 3f5431a.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9518//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697628/HDFS-6133-10.patch against trunk revision 3f5431a. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9518//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Jenkins somehow has not picked up this. Just have manually started a build.

          Show
          Tsz Wo Nicholas Sze added a comment - Jenkins somehow has not picked up this. Just have manually started a build.
          Hide
          Yongjun Zhang added a comment -

          Thanks zhaoyunjiong.

          Are you saying NumFavouredNodes is equivalent to replication factor? I thought they are different: the favoredNodes is specified when constructing DFSOutputStream, as a collection of DNs. The write pipeline of a data block usually has three replicas, if any of the replicas happens to be in the collection, and the data is pinned there.

          I guess what you meant is that the normal use cases are, when we construct DFSOutputStream for a file, we usually only pass replication number of DNs as favored DNs? Is there use scenario that one want to specify larger number of favoredNodes?

          Thanks.

          Show
          Yongjun Zhang added a comment - Thanks zhaoyunjiong . Are you saying NumFavouredNodes is equivalent to replication factor? I thought they are different: the favoredNodes is specified when constructing DFSOutputStream, as a collection of DNs. The write pipeline of a data block usually has three replicas, if any of the replicas happens to be in the collection, and the data is pinned there. I guess what you meant is that the normal use cases are, when we construct DFSOutputStream for a file, we usually only pass replication number of DNs as favored DNs? Is there use scenario that one want to specify larger number of favoredNodes? Thanks.
          Hide
          zhaoyunjiong added a comment -

          Most of user will use 3 replicates, so NumFavoredNodes should be 3 for most case. I don't think this will cause big problem.

          Show
          zhaoyunjiong added a comment - Most of user will use 3 replicates, so NumFavoredNodes should be 3 for most case. I don't think this will cause big problem.
          Hide
          Yongjun Zhang added a comment -

          Hi zhaoyunjiong,

          Nice work! Thanks Nicholas for the review, the patch looks good to me too. One nit:

          The DFSOutputStream#getPinnings has nested loop. The search cost is NumNodesInPipeline * NumFavoredNodes. From application point of view, I wonder how many favoredNodes could be for a big cluster . If needed, we could create a hash set when doing setFavoredNodes for quicker search. This may not be needed if the number is small. Anyways, I don't think we need to make this change in this jira, just something to watch out.

          Thanks.

          Show
          Yongjun Zhang added a comment - Hi zhaoyunjiong , Nice work! Thanks Nicholas for the review, the patch looks good to me too. One nit: The DFSOutputStream#getPinnings has nested loop. The search cost is NumNodesInPipeline * NumFavoredNodes. From application point of view, I wonder how many favoredNodes could be for a big cluster . If needed, we could create a hash set when doing setFavoredNodes for quicker search. This may not be needed if the number is small. Anyways, I don't think we need to make this change in this jira, just something to watch out. Thanks.
          Hide
          zhaoyunjiong added a comment -

          Sure.
          Create HDFS-7759 to track.
          Thanks for your time to review the patch.

          Show
          zhaoyunjiong added a comment - Sure. Create HDFS-7759 to track. Thanks for your time to review the patch.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > You mean choosing the mechanism don't rely on the os but rely on the mechanism only?

          Admins can choose which the mechanism they want to use. Of course, we should throw an error if they choose sticky bit in Windows.

          Sticky bit is more preferable but the support of sticky bit is not standard. Some systems does not support it. Even for the systems supporting it, they may have different meaning. So I think we need to allow admin to choose the mechanism.

          Show
          Tsz Wo Nicholas Sze added a comment - > You mean choosing the mechanism don't rely on the os but rely on the mechanism only? Admins can choose which the mechanism they want to use. Of course, we should throw an error if they choose sticky bit in Windows. Sticky bit is more preferable but the support of sticky bit is not standard. Some systems does not support it. Even for the systems supporting it, they may have different meaning. So I think we need to allow admin to choose the mechanism.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good.

          (Let's do the existence-of-a-second-file implementations in a separated JIRA.)

          Show
          Tsz Wo Nicholas Sze added a comment - +1 patch looks good. (Let's do the existence-of-a-second-file implementations in a separated JIRA.)
          Hide
          zhaoyunjiong added a comment -

          You mean choosing the mechanism don't rely on the os but rely on the mechanism only?

          Show
          zhaoyunjiong added a comment - You mean choosing the mechanism don't rely on the os but rely on the mechanism only?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > If you agree with mixed mechanism, ...

          Let's have another conf for choosing the mechanism. The feature is disabled by default. When admin enables it, one should also choose the mechanism.

          Show
          Tsz Wo Nicholas Sze added a comment - > If you agree with mixed mechanism, ... Let's have another conf for choosing the mechanism. The feature is disabled by default. When admin enables it, one should also choose the mechanism.
          Hide
          zhaoyunjiong added a comment -

          Thanks Nicholas.
          Update patch to use dfs.datanode.block-pinning.enabled.

          > I guess using existence of a second file is easier to be compatible across different platforms (though sticky bit has the advantage of storing the info at the same file).
          If using a different file, we need to delete it when deleting the block. How about using the execute bit?

          I'm not sure whether there are block files created with execute bit set. I don't think it is safe enough to use execute bit.

          I prefer Yongjun Zhang's mixed mechanism:

          Another option is to use a mixed mechanism, say, if it's on linux, use sticky bit, otherwise, use existence of a second file. This method means lack of consistency between different platforms. Just wanted to throw a thought here.

          If you agree with mixed mechanism, I'll implement it before my vacation start on Feb 14th.

          Show
          zhaoyunjiong added a comment - Thanks Nicholas. Update patch to use dfs.datanode.block-pinning.enabled. > I guess using existence of a second file is easier to be compatible across different platforms (though sticky bit has the advantage of storing the info at the same file). If using a different file, we need to delete it when deleting the block. How about using the execute bit? I'm not sure whether there are block files created with execute bit set. I don't think it is safe enough to use execute bit. I prefer Yongjun Zhang's mixed mechanism: Another option is to use a mixed mechanism, say, if it's on linux, use sticky bit, otherwise, use existence of a second file. This method means lack of consistency between different platforms. Just wanted to throw a thought here. If you agree with mixed mechanism, I'll implement it before my vacation start on Feb 14th.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12697420/HDFS-6133-9.patch
          against trunk revision 7d73202.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9501//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9501//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9501//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697420/HDFS-6133-9.patch against trunk revision 7d73202. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9501//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9501//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9501//console This message is automatically generated.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > I guess using existence of a second file is easier to be compatible across different platforms (though sticky bit has the advantage of storing the info at the same file).

          If using a different file, we need to delete it when deleting the block. How about using the execute bit?

          Some comments on the latest patch:

          • The new pinning conf property should be in datanode. We cannot trust clients setting the conf correctly. How about calling it "dfs.datanode.block-pinning.enabled"? When it is disabled, FsDatasetImpl.getPinning always return false and setPinning is an no-op.
          • We should have some javadoc for FsDatasetSpi.setPinning. How about the following?
              /**
               * Set a block to be pinned on this datanode so that it cannot be moved
               * by Balancer/Mover.
               *
               * It is a no-op when dfs.datanode.block-pinning.enabled is set to false.
               */
              public void setPinning(ExtendedBlock block) throws IOException;
            
          Show
          Tsz Wo Nicholas Sze added a comment - > I guess using existence of a second file is easier to be compatible across different platforms (though sticky bit has the advantage of storing the info at the same file). If using a different file, we need to delete it when deleting the block. How about using the execute bit? Some comments on the latest patch: The new pinning conf property should be in datanode. We cannot trust clients setting the conf correctly. How about calling it "dfs.datanode.block-pinning.enabled"? When it is disabled, FsDatasetImpl.getPinning always return false and setPinning is an no-op. We should have some javadoc for FsDatasetSpi.setPinning. How about the following? /** * Set a block to be pinned on this datanode so that it cannot be moved * by Balancer/Mover. * * It is a no-op when dfs.datanode.block-pinning.enabled is set to false . */ public void setPinning(ExtendedBlock block) throws IOException;
          Hide
          zhaoyunjiong added a comment -

          Thanks Nicholas, Yongjun Zhang and Benoy Antony.
          Update patch add a configuration to enable/disable pin feature. I use a configuration set on DFSClient, compared to set it on DataNode side, I think it is more flexible.

          PBHelper.convert(..) only adds one FALSE when targetPinnings == null. Should we add n FALSEs, where n = targetPinnings.length?

          One is enough, because the DataNode in Pipeline will add another one when send to next.
          For how to implement this feature on Windows and whether use sticky bit or a second file, can we do it on another jira issue later?

          Show
          zhaoyunjiong added a comment - Thanks Nicholas, Yongjun Zhang and Benoy Antony. Update patch add a configuration to enable/disable pin feature. I use a configuration set on DFSClient, compared to set it on DataNode side, I think it is more flexible. PBHelper.convert(..) only adds one FALSE when targetPinnings == null. Should we add n FALSEs, where n = targetPinnings.length? One is enough, because the DataNode in Pipeline will add another one when send to next. For how to implement this feature on Windows and whether use sticky bit or a second file, can we do it on another jira issue later?
          Hide
          Benoy Antony added a comment -

          I know that there are different ways to accomplish the desired functionality of pinning the blocks under a directory to a set of nodes.
          I want to suggest yet another approach using storage policies.

          This is based on the fact that the set of favored nodes can be considered as a tier.

          Let’s say blocks under /hbase needs to be pinned to a set of nodes.

          1. Mark the data.dirs on those set of datanodes as a special storage type , say HBASE_TIER. [HBASE_TIER]/hadoop01/data
          2. Define a storage policy - HBASE_POLICY which insists that all blocks following the policy, reside in HBASE_TIER
          3. Set storage policy of /hbase to HBASE_POLICY

          Since the balancer(and mover) considers the storage policy while moving/balancing, balancer will not move these blocks out of the favored nodes(HBASE_TIER)

          Now if we want to keep other blocks ( non hbase blocks) on these datanodes as well(may not be a great idea), then we can model it by having multiple storage types on a single directory like [HBASE_TIER,DISK]/hadoop01/data. This make sense since one want to have blocks having multiple storage characteristics on the same node.
          This will need a (minor) code modification.

          Show
          Benoy Antony added a comment - I know that there are different ways to accomplish the desired functionality of pinning the blocks under a directory to a set of nodes. I want to suggest yet another approach using storage policies. This is based on the fact that the set of favored nodes can be considered as a tier. Let’s say blocks under /hbase needs to be pinned to a set of nodes. 1. Mark the data.dirs on those set of datanodes as a special storage type , say HBASE_TIER. [HBASE_TIER] /hadoop01/data 2. Define a storage policy - HBASE_POLICY which insists that all blocks following the policy, reside in HBASE_TIER 3. Set storage policy of /hbase to HBASE_POLICY Since the balancer(and mover) considers the storage policy while moving/balancing, balancer will not move these blocks out of the favored nodes(HBASE_TIER) Now if we want to keep other blocks ( non hbase blocks) on these datanodes as well(may not be a great idea), then we can model it by having multiple storage types on a single directory like [HBASE_TIER,DISK] /hadoop01/data. This make sense since one want to have blocks having multiple storage characteristics on the same node. This will need a (minor) code modification.
          Hide
          Yongjun Zhang added a comment -

          Thanks Yunjiong for the continuous effort on this work, and Nicholas for further reviewing!

          BTW, just heard from Chris Nauroth that Windows does not support sticky bit. Is there some other ways to implement block pinning? Use file name?

          Daryn Sharp pointed out some options earlier:

          Other approaches might be using the sticky bit or the existence of a second file to denote a pinned block.

          in
          https://issues.apache.org/jira/browse/HDFS-6133?focusedCommentId=13987736&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13987736

          I guess using existence of a second file is easier to be compatible across different platforms (though sticky bit has the advantage of storing the info at the same file).

          Another option is to use a mixed mechanism, say, if it's on linux, use sticky bit, otherwise, use existence of a second file. This method means lack of consistency between different platforms. Just wanted to throw a thought here.

          Thanks.

          Show
          Yongjun Zhang added a comment - Thanks Yunjiong for the continuous effort on this work, and Nicholas for further reviewing! BTW, just heard from Chris Nauroth that Windows does not support sticky bit. Is there some other ways to implement block pinning? Use file name? Daryn Sharp pointed out some options earlier: Other approaches might be using the sticky bit or the existence of a second file to denote a pinned block. in https://issues.apache.org/jira/browse/HDFS-6133?focusedCommentId=13987736&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13987736 I guess using existence of a second file is easier to be compatible across different platforms (though sticky bit has the advantage of storing the info at the same file). Another option is to use a mixed mechanism, say, if it's on linux, use sticky bit, otherwise, use existence of a second file. This method means lack of consistency between different platforms. Just wanted to throw a thought here. Thanks.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          We should also add a configuration to enable/disable this feature since block pinning may not be preferable for some deployment.

          BTW, just heard from Chris Nauroth that Windows does not support sticky bit. Is there some other ways to implement block pinning? Use file name?

          Show
          Tsz Wo Nicholas Sze added a comment - We should also add a configuration to enable/disable this feature since block pinning may not be preferable for some deployment. BTW, just heard from Chris Nauroth that Windows does not support sticky bit. Is there some other ways to implement block pinning? Use file name?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > ... where n = targetPinnings.length ...

          It should be n = nodes.length.

          Show
          Tsz Wo Nicholas Sze added a comment - > ... where n = targetPinnings.length ... It should be n = nodes.length.
          Hide
          Tsz Wo Nicholas Sze added a comment -
          • PBHelper.convert(..) only adds one FALSE when targetPinnings == null. Should we add n FALSEs, where n = targetPinnings.length?
          • Remove "use sticky bit" from the javadoc below since an implementation may not use sticky bit.
            //FsDatasetSpi
            +  /**
            +   * Use sticky bit to pin the block
            +   */
            +  public void setPinning(ExtendedBlock block) throws IOException;
            
          • FileUtil currently does not support sticky bit so that using LocalFileSystem is correct. Is there a way to add support to stick bit in FileUtil?
          • There are some tab characters. Please replace them by spaces.

          Thanks for working hard on this!

          Show
          Tsz Wo Nicholas Sze added a comment - PBHelper.convert(..) only adds one FALSE when targetPinnings == null. Should we add n FALSEs, where n = targetPinnings.length? Remove "use sticky bit" from the javadoc below since an implementation may not use sticky bit. //FsDatasetSpi + /** + * Use sticky bit to pin the block + */ + public void setPinning(ExtendedBlock block) throws IOException; FileUtil currently does not support sticky bit so that using LocalFileSystem is correct. Is there a way to add support to stick bit in FileUtil? There are some tab characters. Please replace them by spaces. Thanks for working hard on this!
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12697036/HDFS-6133-8.patch
          against trunk revision 1425e3d.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9457//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9457//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12697036/HDFS-6133-8.patch against trunk revision 1425e3d. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9457//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9457//console This message is automatically generated.
          Hide
          zhaoyunjiong added a comment -

          This version should pass the tests.

          Show
          zhaoyunjiong added a comment - This version should pass the tests.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12696984/HDFS-6133-7.patch
          against trunk revision 7b10ef0.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.balancer.TestBalancer

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9455//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9455//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696984/HDFS-6133-7.patch against trunk revision 7b10ef0. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancer Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9455//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9455//console This message is automatically generated.
          Hide
          zhaoyunjiong added a comment -

          Thanks Nicholas.
          Update patch to fix test failures.
          By the way, I use "dev-support/test-patch.sh" to test my patch yesterday, and it didn't find out the test failure, seems my local test environments have some problems.

          Show
          zhaoyunjiong added a comment - Thanks Nicholas. Update patch to fix test failures. By the way, I use "dev-support/test-patch.sh" to test my patch yesterday, and it didn't find out the test failure, seems my local test environments have some problems.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          zhaoyunjiong, thanks for the update. There are quite a few test failures. Please fix them. I will review the patch after that.

          Show
          Tsz Wo Nicholas Sze added a comment - zhaoyunjiong , thanks for the update. There are quite a few test failures. Please fix them. I will review the patch after that.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12696748/HDFS-6133-7.patch
          against trunk revision 20660b7.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestDecommission
          org.apache.hadoop.hdfs.TestReplaceDatanodeOnFailure
          org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
          org.apache.hadoop.hdfs.TestReadWhileWriting
          org.apache.hadoop.hdfs.TestFileAppend2
          org.apache.hadoop.hdfs.server.mover.TestStorageMover
          org.apache.hadoop.hdfs.TestGetFileChecksum
          org.apache.hadoop.hdfs.server.namenode.ha.TestHAAppend
          org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks
          org.apache.hadoop.hdfs.server.namenode.TestAddBlock
          org.apache.hadoop.hdfs.TestFileAppendRestart
          org.apache.hadoop.hdfs.TestReplication
          org.apache.hadoop.hdfs.TestFileCreation
          org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork
          org.apache.hadoop.hdfs.TestModTime
          org.apache.hadoop.hdfs.server.balancer.TestBalancer
          org.apache.hadoop.hdfs.server.namenode.TestFileTruncate
          org.apache.hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate
          org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
          org.apache.hadoop.fs.permission.TestStickyBit
          org.apache.hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks
          org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication
          org.apache.hadoop.hdfs.TestCrcCorruption
          org.apache.hadoop.hdfs.TestGetBlocks
          org.apache.hadoop.hdfs.server.namenode.TestFSImageWithSnapshot
          org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport
          org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
          org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes
          org.apache.hadoop.hdfs.TestSetrepIncreasing
          org.apache.hadoop.hdfs.TestFileAppend3
          org.apache.hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer
          org.apache.hadoop.hdfs.TestPipelines
          org.apache.hadoop.hdfs.TestSetrepDecreasing
          org.apache.hadoop.hdfs.TestFileAppend4
          org.apache.hadoop.hdfs.TestBlockStoragePolicy
          org.apache.hadoop.hdfs.server.namenode.snapshot.TestINodeFileUnderConstructionWithSnapshot
          org.apache.hadoop.hdfs.TestEncryptedTransfer
          org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes
          org.apache.hadoop.hdfs.server.datanode.TestHSync
          org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
          org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
          org.apache.hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality
          org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing

          The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
          org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9439//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9439//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9439//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12696748/HDFS-6133-7.patch against trunk revision 20660b7. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDecommission org.apache.hadoop.hdfs.TestReplaceDatanodeOnFailure org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA org.apache.hadoop.hdfs.TestReadWhileWriting org.apache.hadoop.hdfs.TestFileAppend2 org.apache.hadoop.hdfs.server.mover.TestStorageMover org.apache.hadoop.hdfs.TestGetFileChecksum org.apache.hadoop.hdfs.server.namenode.ha.TestHAAppend org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.server.namenode.TestAddBlock org.apache.hadoop.hdfs.TestFileAppendRestart org.apache.hadoop.hdfs.TestReplication org.apache.hadoop.hdfs.TestFileCreation org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork org.apache.hadoop.hdfs.TestModTime org.apache.hadoop.hdfs.server.balancer.TestBalancer org.apache.hadoop.hdfs.server.namenode.TestFileTruncate org.apache.hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate org.apache.hadoop.hdfs.server.namenode.TestCheckpoint org.apache.hadoop.fs.permission.TestStickyBit org.apache.hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication org.apache.hadoop.hdfs.TestCrcCorruption org.apache.hadoop.hdfs.TestGetBlocks org.apache.hadoop.hdfs.server.namenode.TestFSImageWithSnapshot org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes org.apache.hadoop.hdfs.TestSetrepIncreasing org.apache.hadoop.hdfs.TestFileAppend3 org.apache.hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer org.apache.hadoop.hdfs.TestPipelines org.apache.hadoop.hdfs.TestSetrepDecreasing org.apache.hadoop.hdfs.TestFileAppend4 org.apache.hadoop.hdfs.TestBlockStoragePolicy org.apache.hadoop.hdfs.server.namenode.snapshot.TestINodeFileUnderConstructionWithSnapshot org.apache.hadoop.hdfs.TestEncryptedTransfer org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes org.apache.hadoop.hdfs.server.datanode.TestHSync org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup org.apache.hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9439//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9439//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9439//console This message is automatically generated.
          Hide
          zhaoyunjiong added a comment -

          Thanks Tsz Wo Nicholas Sze.
          Update patch to add targetPinnings, only pin on the favorite datanodes.

          Show
          zhaoyunjiong added a comment - Thanks Tsz Wo Nicholas Sze. Update patch to add targetPinnings, only pin on the favorite datanodes.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Thanks for working on this. The pinning idea is very interesting!

          A replica is pinned only if it is stored in a favored node. In a write pipeline, it could possibly be some datanodes are favored nodes but some are not. So the new parameters in DataTransferProtocol.writeBlock should be similar to storageType and targetStorageTypes, i.e. a new pinning parameter to the next datanodes and another new parameter targetPinnings for the downstream datanodes.

             public void writeBlock(final ExtendedBlock blk,
                 final StorageType storageType, 
          +      final boolean pinning,
                 final Token<BlockTokenIdentifier> blockToken,
                 final String clientName,
                 final DatanodeInfo[] targets,
                 final StorageType[] targetStorageTypes, 
          +      final boolean[] targetPinnings,
                 final DatanodeInfo source,
                 final BlockConstructionStage stage,
                 final int pipelineSize,
          
          Show
          Tsz Wo Nicholas Sze added a comment - Thanks for working on this. The pinning idea is very interesting! A replica is pinned only if it is stored in a favored node. In a write pipeline, it could possibly be some datanodes are favored nodes but some are not. So the new parameters in DataTransferProtocol.writeBlock should be similar to storageType and targetStorageTypes, i.e. a new pinning parameter to the next datanodes and another new parameter targetPinnings for the downstream datanodes. public void writeBlock( final ExtendedBlock blk, final StorageType storageType, + final boolean pinning, final Token<BlockTokenIdentifier> blockToken, final String clientName, final DatanodeInfo[] targets, final StorageType[] targetStorageTypes, + final boolean [] targetPinnings, final DatanodeInfo source, final BlockConstructionStage stage, final int pipelineSize,
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12688622/HDFS-6133-6.patch
          against trunk revision ecf1469.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.fs.TestSymlinkHdfsFileContext

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9108//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9108//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9108//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12688622/HDFS-6133-6.patch against trunk revision ecf1469. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.TestSymlinkHdfsFileContext Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9108//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9108//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9108//console This message is automatically generated.
          Hide
          zhaoyunjiong added a comment -

          Thanks Kai Zheng for point out the bug in TestBalancer. Fixed in new patch.
          Also update the comment mentioned.

          For mover, it use replaceBlock same as balancer, so mover will not able to move blocks as well.

          For torage type/policy, Kihwal Lee already answered the question:

          We will make additional changes in separate jiras to make NN aware of favored nodes. I think the patch in this jira is fine as a stepping stone for the further work.

          Show
          zhaoyunjiong added a comment - Thanks Kai Zheng for point out the bug in TestBalancer. Fixed in new patch. Also update the comment mentioned. For mover, it use replaceBlock same as balancer, so mover will not able to move blocks as well. For torage type/policy, Kihwal Lee already answered the question: We will make additional changes in separate jiras to make NN aware of favored nodes. I think the patch in this jira is fine as a stepping stone for the further work.
          Hide
          Kai Zheng added a comment -

          Hi Yunjiong,

          The patch looks great. Minor comments:
          1. Regarding the following codes, are you only putting the first DN as favored node or all the DNs.

          +      InetSocketAddress[] favoredNodes = new InetSocketAddress[numOfDatanodes];
          +      for (int i = 0; i < favoredNodes.length; i++) {
          +        favoredNodes[i] = cluster.getDataNodes().get(0).getXferAddress();
          +      }
          

          2. In DistributedFileSystem.java, there is a comment as below. It would be better to be updated as well.

            /**
             * Same as  
             * {@link #create(Path, FsPermission, boolean, int, short, long, 
             * Progressable)} with the addition of favoredNodes that is a hint to 
             * where the namenode should place the file blocks.
             * The favored nodes hint is not persisted in HDFS. Hence it may be honored
             * at the creation time only. HDFS could move the blocks during balancing or
             * replication, to move the blocks from favored nodes. A value of null means
             * no favored nodes for this create
             */
          

          3. By the way, a question: do we need any handling for favored nodes and the pin way considering storage type/policy/Mover?

          Thanks.

          Show
          Kai Zheng added a comment - Hi Yunjiong, The patch looks great. Minor comments: 1. Regarding the following codes, are you only putting the first DN as favored node or all the DNs. + InetSocketAddress[] favoredNodes = new InetSocketAddress[numOfDatanodes]; + for ( int i = 0; i < favoredNodes.length; i++) { + favoredNodes[i] = cluster.getDataNodes().get(0).getXferAddress(); + } 2. In DistributedFileSystem.java, there is a comment as below. It would be better to be updated as well. /** * Same as * {@link #create(Path, FsPermission, boolean , int , short , long , * Progressable)} with the addition of favoredNodes that is a hint to * where the namenode should place the file blocks. * The favored nodes hint is not persisted in HDFS. Hence it may be honored * at the creation time only. HDFS could move the blocks during balancing or * replication, to move the blocks from favored nodes. A value of null means * no favored nodes for this create */ 3. By the way, a question: do we need any handling for favored nodes and the pin way considering storage type/policy/Mover? Thanks.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12683784/HDFS-6133-5.patch
          against trunk revision 4a31611.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8845//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8845//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683784/HDFS-6133-5.patch against trunk revision 4a31611. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8845//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8845//console This message is automatically generated.
          Hide
          zhaoyunjiong added a comment -

          Thanks Yongjun Zhang, update patch to fix the format.

          Show
          zhaoyunjiong added a comment - Thanks Yongjun Zhang, update patch to fix the format.
          Hide
          Yongjun Zhang added a comment -

          Hi Kihwal, thanks for the clarification.

          Hi zhaoyunjiong, thanks for the new rev to address my earlier comments. There is one nit with the latest rev (4), there alignment at line 316 area in DFSTestUtil.java need to be fixed.

          Would any committer please pick up from here to push through?

          Thanks.

          Show
          Yongjun Zhang added a comment - Hi Kihwal, thanks for the clarification. Hi zhaoyunjiong , thanks for the new rev to address my earlier comments. There is one nit with the latest rev (4), there alignment at line 316 area in DFSTestUtil.java need to be fixed. Would any committer please pick up from here to push through? Thanks.
          Hide
          Kihwal Lee added a comment -

          We will make additional changes in separate jiras to make NN aware of favored nodes. I think the patch in this jira is fine as a stepping stone for the further work.

          Show
          Kihwal Lee added a comment - We will make additional changes in separate jiras to make NN aware of favored nodes. I think the patch in this jira is fine as a stepping stone for the further work.
          Hide
          Yongjun Zhang added a comment -

          Thanks Kihwal Lee for the comments and zhaoyunjiong for the new rev.

          HI Kihwal,

          Per you last comment,

          We are in the process of optimizing BlockManager and block report processing. It will be better to add the rest of the pieces involving reporting to NN and management on NN after this change. One thing that we need to make sure is that this jira is compatible with the changes to come. Since DN has to include the pinning info in full and incremental block reports, it needs to be recorded in memory, not queries against file system on demand. The in-memory state needs to be reconstructed on startup and things like DirectoryScanner might need to be made aware of block pinning too. These don't need to be done in this jira, but the design should not preclude such future changes.

          Would you please clarify what you mean by "the rest of the pieces"? Do you mean the current patch work for this jira that doesn't handle throwing-away-over-replicated and corrupted-pinned-block yet? , or you mean just the additional work needed on top of the current patch to take care of throwing-away-over-replicated and corrupted-pinned-block?

          Since you said "These don't need to be done in this jira", does "These" mean the same as "the rest of the pieces" above?

          Essentially I'm asking for a clarification of the scope of the fix you are expecting to address this jira here.

          Thanks.

          Show
          Yongjun Zhang added a comment - Thanks Kihwal Lee for the comments and zhaoyunjiong for the new rev. HI Kihwal, Per you last comment, We are in the process of optimizing BlockManager and block report processing. It will be better to add the rest of the pieces involving reporting to NN and management on NN after this change. One thing that we need to make sure is that this jira is compatible with the changes to come. Since DN has to include the pinning info in full and incremental block reports, it needs to be recorded in memory, not queries against file system on demand. The in-memory state needs to be reconstructed on startup and things like DirectoryScanner might need to be made aware of block pinning too. These don't need to be done in this jira, but the design should not preclude such future changes. Would you please clarify what you mean by "the rest of the pieces"? Do you mean the current patch work for this jira that doesn't handle throwing-away-over-replicated and corrupted-pinned-block yet? , or you mean just the additional work needed on top of the current patch to take care of throwing-away-over-replicated and corrupted-pinned-block? Since you said "These don't need to be done in this jira", does "These" mean the same as "the rest of the pieces" above? Essentially I'm asking for a clarification of the scope of the fix you are expecting to address this jira here. Thanks.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12682349/HDFS-6133-4.patch
          against trunk revision 79301e8.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8778//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8778//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12682349/HDFS-6133-4.patch against trunk revision 79301e8. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8778//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8778//console This message is automatically generated.
          Hide
          zhaoyunjiong added a comment -

          Thanks Yongjun Zhang.
          Update patch according to comments.

          The concept of favoredNodes pre-existed before your patch, now your patch defines that as long as favoredNodes is passed, then pinning will be done. So we are changing the prior definition of how favoredNodes are used. Why not add some additional interface to tell that pinning will happen so we have the option not to pin even if favoredNodes is passed? Not necessarily you need to do what I suggested here, but I'd like to understand your thoughts here.

          I think most of time if you use favoredNodes, you'd like to keep the block on that machine, so to keep things simple, I didn't add new interface.

          Do we ever need interface to do unpinning?

          We can add unpinning in another issue if there are user case need that.

          Show
          zhaoyunjiong added a comment - Thanks Yongjun Zhang. Update patch according to comments. The concept of favoredNodes pre-existed before your patch, now your patch defines that as long as favoredNodes is passed, then pinning will be done. So we are changing the prior definition of how favoredNodes are used. Why not add some additional interface to tell that pinning will happen so we have the option not to pin even if favoredNodes is passed? Not necessarily you need to do what I suggested here, but I'd like to understand your thoughts here. I think most of time if you use favoredNodes, you'd like to keep the block on that machine, so to keep things simple, I didn't add new interface. Do we ever need interface to do unpinning? We can add unpinning in another issue if there are user case need that.
          Hide
          Kihwal Lee added a comment -

          We are in the process of optimizing BlockManager and block report processing. It will be better to add the rest of the pieces involving reporting to NN and management on NN after this change. One thing that we need to make sure is that this jira is compatible with the changes to come. Since DN has to include the pinning info in full and incremental block reports, it needs to be recorded in memory, not queries against file system on demand. The in-memory state needs to be reconstructed on startup and things like DirectoryScanner might need to be made aware of block pinning too. These don't need to be done in this jira, but the design should not preclude such future changes.

          For storing and reporting pinning information, we are planning to borrow some bits from the block size variable. On the namenode side, the optimization will give us a few bits for storing this information. I will file a separate jira when we have more details. This jira will be linked as a dependency.

          Show
          Kihwal Lee added a comment - We are in the process of optimizing BlockManager and block report processing. It will be better to add the rest of the pieces involving reporting to NN and management on NN after this change. One thing that we need to make sure is that this jira is compatible with the changes to come. Since DN has to include the pinning info in full and incremental block reports, it needs to be recorded in memory, not queries against file system on demand. The in-memory state needs to be reconstructed on startup and things like DirectoryScanner might need to be made aware of block pinning too. These don't need to be done in this jira, but the design should not preclude such future changes. For storing and reporting pinning information, we are planning to borrow some bits from the block size variable. On the namenode side, the optimization will give us a few bits for storing this information. I will file a separate jira when we have more details. This jira will be linked as a dependency.
          Hide
          Yongjun Zhang added a comment -

          HI Kihwal Lee and Chris Trezzo,

          Thanks a lot for sharing the info and great insights here.

          About the throwing-away-over-replicated case, I think the " include pinning info in block reports and remember it in block manager" approach Kihwal and Daryn suggested seems reasonable. And I agree corrupted-pinned-block case need some more thoughts and careful logic.

          About blockpool-aware balancing policy, since balancer works on each blockpool independently, it seems an natural approach. However, I think this would be complementary solution to the approach here. I think it deserves its own jira and it can be worked on in parallel.

          Currently one NN is associated with only one blockpool. And federated cluster is not yet widely used as far as I know. Implementation-wise, there are two options I can think of if we want o use blockpool-aware balancing policy to solve the problem here:

          1. User need to choose to use federated cluster, and put all files that need to be pinned into dedicated blockpool.
          2. Make a single NN to handle multipe block-pools.
            The solution would work nicely for already federated cluster. For others, it won't be easy.

          Right now we don't have the capability to handle pinning at block/file granularity (balancer does have the option to exclude nodes from being touched). It seems even providing a solution without handling the throwing-away-over-replicated case would help alleviating the pain.

          Let's see if we can include a mechanism in the patch of this jira, or at least think through how to handle the two cases (throwing-away-over-replicated, and corrupted-pinned-block).

          Thanks.

          Show
          Yongjun Zhang added a comment - HI Kihwal Lee and Chris Trezzo , Thanks a lot for sharing the info and great insights here. About the throwing-away-over-replicated case, I think the " include pinning info in block reports and remember it in block manager" approach Kihwal and Daryn suggested seems reasonable. And I agree corrupted-pinned-block case need some more thoughts and careful logic. About blockpool-aware balancing policy, since balancer works on each blockpool independently, it seems an natural approach. However, I think this would be complementary solution to the approach here. I think it deserves its own jira and it can be worked on in parallel. Currently one NN is associated with only one blockpool. And federated cluster is not yet widely used as far as I know. Implementation-wise, there are two options I can think of if we want o use blockpool-aware balancing policy to solve the problem here: User need to choose to use federated cluster, and put all files that need to be pinned into dedicated blockpool. Make a single NN to handle multipe block-pools. The solution would work nicely for already federated cluster. For others, it won't be easy. Right now we don't have the capability to handle pinning at block/file granularity (balancer does have the option to exclude nodes from being touched). It seems even providing a solution without handling the throwing-away-over-replicated case would help alleviating the pain. Let's see if we can include a mechanism in the patch of this jira, or at least think through how to handle the two cases (throwing-away-over-replicated, and corrupted-pinned-block). Thanks.
          Hide
          Chris Trezzo added a comment -

          I am slightly late to the party on this one, but at Twitter we have a similar need for a slightly different use case. We use federation and each block pool has a different workload. For some of these workloads it does not make sense to run the balancer. For example, we have a block pool associated with a tmp namespace. Ideally, we would never want to balance blocks in that block pool because they will be deleted shortly anyways. One design approach we were contemplating is to make the balancer block pool-aware. You could then run the balancer on a per-block pool basis and have pluggable balancing strategies for each pool (i.e. the balancer policy in the block pool associated with the tmp namespace is a no-op). This allows the balancer to continue to be decoupled with the namespace and only needs to know about the block pool (we can still separate the BM at a later point).

          The above might work for this use case as well. The balancer policy for the block pool containing blocks in hbase would be a no-op. Let me know what you guys think. I can see the block pool design being orthogonal to this JIRA, so let me know if I should open up a separate JIRA for this effort. We could potentially use the pinning strategy for our use case as well, but I hesitate for the same reasons that Kihwal Lee mentioned above with respect to corrupt/unavailable blocks.

          Show
          Chris Trezzo added a comment - I am slightly late to the party on this one, but at Twitter we have a similar need for a slightly different use case. We use federation and each block pool has a different workload. For some of these workloads it does not make sense to run the balancer. For example, we have a block pool associated with a tmp namespace. Ideally, we would never want to balance blocks in that block pool because they will be deleted shortly anyways. One design approach we were contemplating is to make the balancer block pool-aware. You could then run the balancer on a per-block pool basis and have pluggable balancing strategies for each pool (i.e. the balancer policy in the block pool associated with the tmp namespace is a no-op). This allows the balancer to continue to be decoupled with the namespace and only needs to know about the block pool (we can still separate the BM at a later point). The above might work for this use case as well. The balancer policy for the block pool containing blocks in hbase would be a no-op. Let me know what you guys think. I can see the block pool design being orthogonal to this JIRA, so let me know if I should open up a separate JIRA for this effort. We could potentially use the pinning strategy for our use case as well, but I hesitate for the same reasons that Kihwal Lee mentioned above with respect to corrupt/unavailable blocks.
          Hide
          Kihwal Lee added a comment -

          This might be outside the scope of this jira, but I think we need to think about this before going further. If a node with a pinned block is temporarily unavailable, the namenode will try to replicate the block as it is under-replicated. When the node recovers, the block is over-replicated and a replica will be invalidated. How do we make sure it is not removed from the favored node? I think this scenario can happen during start-up or transient infra/network issues.

          Daryn and I had a brief discussion about this. It might be possible to include pinning info in block reports and remember it in block manager. This will enable NN to make the right decision on over-replicated cases. A bit more complicated logic will be needed when a pinned block gets corrupted on a favored node. The usual replicate + invalidate strategy won't be ideal here.

          Show
          Kihwal Lee added a comment - This might be outside the scope of this jira, but I think we need to think about this before going further. If a node with a pinned block is temporarily unavailable, the namenode will try to replicate the block as it is under-replicated. When the node recovers, the block is over-replicated and a replica will be invalidated. How do we make sure it is not removed from the favored node? I think this scenario can happen during start-up or transient infra/network issues. Daryn and I had a brief discussion about this. It might be possible to include pinning info in block reports and remember it in block manager. This will enable NN to make the right decision on over-replicated cases. A bit more complicated logic will be needed when a pinned block gets corrupted on a favored node. The usual replicate + invalidate strategy won't be ideal here.
          Hide
          Yongjun Zhang added a comment -

          HI zhaoyunjiong,

          Thanks for the good work here, and sorry for getting back late.

          I reviewed the latest patch, and I think it indeed looks very close. I also think your answers to Stack's question make sense to me. Couple of questions:

          1. The concept of favoredNodes pre-existed before your patch, now your patch defines that as long as favoredNodes is passed, then pinning will be done. So we are changing the prior definition of how favoredNodes are used. Why not add some additional interface to tell that pinning will happen so we have the option not to pin even if favoredNodes is passed? Not necessarily you need to do what I suggested here, but I'd like to understand your thoughts here.
          2. Do we ever need interface to do unpinning?

          And I have some comments, mostly nits.

          • DFSOutputStream.java
            Line 1413, change favoredNodes != null ? true: false to favoredNodes != null
          • DataTransferProtocol.java,
            Line 95: "whether pinning the block" to "whether to pin the block"
          • BlockReceiver.java
            Line 178. Move {{this.pinning = pinning;" to before the debug msg printing block, and include the pinning in th edebug message.
          • FSDatasetSpi.java
            Line 479: javadoc the setPinning /getPinning api
          • FsDatasetImpl.java, Line 2618, javadoc that sticky bit is used for pinning purpose
          • DataXceiver.java
            Line 562, remove the newly added extra empty line
          • datatransfer.proto
            Line 125, need javadoc
          • TestBalancer.java
            Can you add a description what testBalancerWithPinnedBlocks does and what the expected result is?

          Thanks.

          Show
          Yongjun Zhang added a comment - HI zhaoyunjiong , Thanks for the good work here, and sorry for getting back late. I reviewed the latest patch, and I think it indeed looks very close. I also think your answers to Stack's question make sense to me. Couple of questions: The concept of favoredNodes pre-existed before your patch, now your patch defines that as long as favoredNodes is passed, then pinning will be done. So we are changing the prior definition of how favoredNodes are used. Why not add some additional interface to tell that pinning will happen so we have the option not to pin even if favoredNodes is passed? Not necessarily you need to do what I suggested here, but I'd like to understand your thoughts here. Do we ever need interface to do unpinning? And I have some comments, mostly nits. DFSOutputStream.java Line 1413, change favoredNodes != null ? true: false to favoredNodes != null DataTransferProtocol.java, Line 95: "whether pinning the block" to "whether to pin the block" BlockReceiver.java Line 178. Move {{this.pinning = pinning;" to before the debug msg printing block, and include the pinning in th edebug message. FSDatasetSpi.java Line 479: javadoc the setPinning /getPinning api FsDatasetImpl.java, Line 2618, javadoc that sticky bit is used for pinning purpose DataXceiver.java Line 562, remove the newly added extra empty line datatransfer.proto Line 125, need javadoc TestBalancer.java Can you add a description what testBalancerWithPinnedBlocks does and what the expected result is? Thanks.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12674125/HDFS-6133-3.patch
          against trunk revision cb81bac.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          -1 release audit. The applied patch generated 1 release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot
          org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
          org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8388//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8388//artifact/patchprocess/patchReleaseAuditProblems.txt
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8388//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12674125/HDFS-6133-3.patch against trunk revision cb81bac. +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. -1 release audit . The applied patch generated 1 release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8388//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8388//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8388//console This message is automatically generated.
          Hide
          zhaoyunjiong added a comment -

          Update patch, merge with trunk.

          Show
          zhaoyunjiong added a comment - Update patch, merge with trunk.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12673853/HDFS-6133-3.patch
          against trunk revision 2a51494.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8372//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673853/HDFS-6133-3.patch against trunk revision 2a51494. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8372//console This message is automatically generated.
          Hide
          zhaoyunjiong added a comment -

          Update patch for merge the trunk.

          Why we always pass false in below?
          1653 new Sender(out).writeBlock(b, accessToken, clientname, targets,
          1654 srcNode, stage, 0, 0, 0, 0, blockSender.getChecksum(),
          1655 cachingStrategy, false);

          This code path happens when NameNode ask DataNode send block to other DataNode(DatanodeProtocol.DNA_TRANSFER), it's not trigged by client, so there is no need pinning the block in this case.

          We will never copy a block?
          925 if (datanode.data.getPinning(block))
          926 String msg = "Not able to copy block " + block.getBlockId() + " " + 927 "to " + peer.getRemoteAddressString() + " because it's pinned "; 928 LOG.info(msg); 929 sendResponse(ERROR, msg);
          Any thing to help ensure replica count does not rot when this pinning is enabled?

          When the block is under replicate, NameNode will send DatanodeProtocol.DNA_TRANSFER command to DataNode and it handled by DataTransfer, pinning won't affect that.

          Show
          zhaoyunjiong added a comment - Update patch for merge the trunk. Why we always pass false in below? 1653 new Sender(out).writeBlock(b, accessToken, clientname, targets, 1654 srcNode, stage, 0, 0, 0, 0, blockSender.getChecksum(), 1655 cachingStrategy, false); This code path happens when NameNode ask DataNode send block to other DataNode(DatanodeProtocol.DNA_TRANSFER), it's not trigged by client, so there is no need pinning the block in this case. We will never copy a block? 925 if (datanode.data.getPinning(block)) 926 String msg = "Not able to copy block " + block.getBlockId() + " " + 927 "to " + peer.getRemoteAddressString() + " because it's pinned "; 928 LOG.info(msg); 929 sendResponse(ERROR, msg); Any thing to help ensure replica count does not rot when this pinning is enabled? When the block is under replicate, NameNode will send DatanodeProtocol.DNA_TRANSFER command to DataNode and it handled by DataTransfer, pinning won't affect that.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12653743/HDFS-6133-2.patch
          against trunk revision 371ef4c.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8252//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653743/HDFS-6133-2.patch against trunk revision 371ef4c. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8252//console This message is automatically generated.
          Hide
          stack added a comment -

          I like pinning during creation. What do others think?

          Until there is something better, seems fine (just one question below).

          We can always improve later.

          Agree though would be coolio if the API added here could prevail in a new implementation.

          A few comments on the patch:

          Why we always pass false in below?

          1653 new Sender(out).writeBlock(b, accessToken, clientname, targets,
          1654 srcNode, stage, 0, 0, 0, 0, blockSender.getChecksum(),
          1655 cachingStrategy, false);

          We will never copy a block?

          925 if (datanode.data.getPinning(block))

          { 926 String msg = "Not able to copy block " + block.getBlockId() + " " + 927 "to " + peer.getRemoteAddressString() + " because it's pinned "; 928 LOG.info(msg); 929 sendResponse(ERROR, msg); 930 }

          Any thing to help ensure replica count does not rot when this pinning is enabled?

          Nice test.

          Thanks.

          Show
          stack added a comment - I like pinning during creation. What do others think? Until there is something better, seems fine (just one question below). We can always improve later. Agree though would be coolio if the API added here could prevail in a new implementation. A few comments on the patch: Why we always pass false in below? 1653 new Sender(out).writeBlock(b, accessToken, clientname, targets, 1654 srcNode, stage, 0, 0, 0, 0, blockSender.getChecksum(), 1655 cachingStrategy, false); We will never copy a block? 925 if (datanode.data.getPinning(block)) { 926 String msg = "Not able to copy block " + block.getBlockId() + " " + 927 "to " + peer.getRemoteAddressString() + " because it's pinned "; 928 LOG.info(msg); 929 sendResponse(ERROR, msg); 930 } Any thing to help ensure replica count does not rot when this pinning is enabled? Nice test. Thanks.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12653743/HDFS-6133-2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.datanode.TestBPOfferService
          org.apache.hadoop.hdfs.TestCrcCorruption

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7277//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7277//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653743/HDFS-6133-2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 8 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestBPOfferService org.apache.hadoop.hdfs.TestCrcCorruption +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7277//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7277//console This message is automatically generated.
          Hide
          zhaoyunjiong added a comment -

          Thanks Daryn Sharp for your time.
          Update patch, use boolean instead of Boolean.

          Show
          zhaoyunjiong added a comment - Thanks Daryn Sharp for your time. Update patch, use boolean instead of Boolean.
          Hide
          Daryn Sharp added a comment -

          Agreed on pinning even if not all favored nodes granted to keep the impl simpler. I think there's a lot of value in a change like this since there are multiple jiras attempting different ways to stop the balancer from moving hbase files. We can always improve later.

          Allowing the user to pin/unpin would be useful, but would be much more complicated so I'd suggest deferring to another jira. Complicated in the sense that the NN has to be pinning aware so it can issue commands to DNs to pin, compatibility concerns during rolling upgrades, etc. Only allowing pinning after the fact instead of during creation does create races where the balancer got to the block before the pin request; either via tight race, or rpc connection dropped, failover, etc.

          I like pinning during creation. What do others think?

          Show
          Daryn Sharp added a comment - Agreed on pinning even if not all favored nodes granted to keep the impl simpler. I think there's a lot of value in a change like this since there are multiple jiras attempting different ways to stop the balancer from moving hbase files. We can always improve later. Allowing the user to pin/unpin would be useful, but would be much more complicated so I'd suggest deferring to another jira. Complicated in the sense that the NN has to be pinning aware so it can issue commands to DNs to pin, compatibility concerns during rolling upgrades, etc. Only allowing pinning after the fact instead of during creation does create races where the balancer got to the block before the pin request; either via tight race, or rpc connection dropped, failover, etc. I like pinning during creation. What do others think?
          Hide
          zhaoyunjiong added a comment -

          I'll use boolean instead of Boolean.

          Yes, the NN may not grant all the requested/favored nodes. The best way is to only pin the blocks on the favored nodes, but considered the probability that NN didn't grant all the favored nodes is small, so I just pinned them all.

          Also I was wondering whether I should provide a API that let user pinning/un-pinning blocks after file created. That might be more useful than combine pinning with favored nodes.

          Show
          zhaoyunjiong added a comment - I'll use boolean instead of Boolean. Yes, the NN may not grant all the requested/favored nodes. The best way is to only pin the blocks on the favored nodes, but considered the probability that NN didn't grant all the favored nodes is small, so I just pinned them all. Also I was wondering whether I should provide a API that let user pinning/un-pinning blocks after file created. That might be more useful than combine pinning with favored nodes.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12643535/HDFS-6133-1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestDatanodeConfig

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6832//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6832//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6832//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643535/HDFS-6133-1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 8 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDatanodeConfig +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6832//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6832//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6832//console This message is automatically generated.
          Hide
          Daryn Sharp added a comment -

          Upon a cursory review, I think the patch is promising. I wonder if favored nodes and pinning should be combined, but it stands to reason that if you want specific nodes that you probably don't want the blocks to move. Plus it's least disruptive to apis. I haven't crawled through the favored node code, but I'm assuming the NN may not grant all the requested/favored nodes? If yes, does it make sense to only pin the blocks on the favored nodes that were granted?

          A small quick question: Do you need to use a Boolean instead a primitive boolean?

          Show
          Daryn Sharp added a comment - Upon a cursory review, I think the patch is promising. I wonder if favored nodes and pinning should be combined, but it stands to reason that if you want specific nodes that you probably don't want the blocks to move. Plus it's least disruptive to apis. I haven't crawled through the favored node code, but I'm assuming the NN may not grant all the requested/favored nodes? If yes, does it make sense to only pin the blocks on the favored nodes that were granted? A small quick question: Do you need to use a Boolean instead a primitive boolean?
          Hide
          zhaoyunjiong added a comment -

          This patch will set sticky bit on the block file if the DFSClient have favored nodes hint set, and refuse to move from Balancer.

          Show
          zhaoyunjiong added a comment - This patch will set sticky bit on the block file if the DFSClient have favored nodes hint set, and refuse to move from Balancer.
          Hide
          Daryn Sharp added a comment -

          I'm not sure the NN needs to know about the pinning. It may only need to be something the client tells the DN when establishing the pipeline. Perhaps it could be stored in the existing block metadata file is, but I don't know how extensible the file is, whether it can be changed with backwards compatibility, etc. Other approaches might be using the sticky bit or the existence of a second file to denote a pinned block.

          Show
          Daryn Sharp added a comment - I'm not sure the NN needs to know about the pinning. It may only need to be something the client tells the DN when establishing the pipeline. Perhaps it could be stored in the existing block metadata file is, but I don't know how extensible the file is, whether it can be changed with backwards compatibility, etc. Other approaches might be using the sticky bit or the existence of a second file to denote a pinned block.
          Hide
          zhaoyunjiong added a comment -

          Yes, block pinning works.
          By the way, where do you think is the best place to store pinning information?
          If save in Block, seems it will cost a lot memory.

          Show
          zhaoyunjiong added a comment - Yes, block pinning works. By the way, where do you think is the best place to store pinning information? If save in Block, seems it will cost a lot memory.
          Hide
          stack added a comment -

          Daryn Sharp Pinning would work as long as we are still able to dictate where to place the replicas. If pinning a block makes it immune to balance, that should work.

          Show
          stack added a comment - Daryn Sharp Pinning would work as long as we are still able to dictate where to place the replicas. If pinning a block makes it immune to balance, that should work.
          Hide
          Daryn Sharp added a comment -

          stack, I quickly skimmed HDFS-2576. Favored placement appears to dovetail very nicely with replica pinning if the goal is to place a 2nd replica on a specific node in the event the primary region server fails. Whether one or more replicas is pinned to preserve favored placement, it would appear to meet the requirement of this jira, correct?

          zhaoyunjiong, ignoring the mid-long term goal of separating the BM service, the preconditions are simple and a valuable improvement to the NN: untangle the namesystem / directory / BM architecture. Each layer should have a specific role with no back references to other layers. Namesystem is path based, directory is inode based, BM is just a BM.

          By allowing no back references, it also opens the possibility of breaking up the global locking by locking subsystems w/o fear of deadlocks. In and of itself, the NN will benefit from a cleaner design and more targeted locking, and if done correctly, it's a small leap to run the BM as a separate service which in turn opens the door to allowing other services than the NN to utilize the block manager. This is why I'm opposed (-1) to making the BM path aware. It ruins the ability to do any of the above.

          Do you believe block pinning will address your issue?

          Show
          Daryn Sharp added a comment - stack , I quickly skimmed HDFS-2576 . Favored placement appears to dovetail very nicely with replica pinning if the goal is to place a 2nd replica on a specific node in the event the primary region server fails. Whether one or more replicas is pinned to preserve favored placement, it would appear to meet the requirement of this jira, correct? zhaoyunjiong , ignoring the mid-long term goal of separating the BM service, the preconditions are simple and a valuable improvement to the NN: untangle the namesystem / directory / BM architecture. Each layer should have a specific role with no back references to other layers. Namesystem is path based, directory is inode based, BM is just a BM. By allowing no back references, it also opens the possibility of breaking up the global locking by locking subsystems w/o fear of deadlocks. In and of itself, the NN will benefit from a cleaner design and more targeted locking, and if done correctly, it's a small leap to run the BM as a separate service which in turn opens the door to allowing other services than the NN to utilize the block manager. This is why I'm opposed (-1) to making the BM path aware. It ruins the ability to do any of the above. Do you believe block pinning will address your issue?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12641891/HDFS-6133.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestDistributedFileSystem

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6733//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6733//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641891/HDFS-6133.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 5 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDistributedFileSystem +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6733//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6733//console This message is automatically generated.
          Hide
          zhaoyunjiong added a comment -

          Upload patch according to comments.

          By the way, do we have new BM service design?

          Show
          zhaoyunjiong added a comment - Upload patch according to comments. By the way, do we have new BM service design?
          Hide
          stack added a comment -

          I believe the goal is to keep the blocks of an hbase column on the region server?

          Yes. Locality is important: e.g. being able to SCR makes a big difference in our serving latencies (and how much CPU we end up using servicing requests).

          ...it's effectively up to hbase to re-balance the cluster....

          Facebook describe their HBase doing this. Add a rack and slow fill by placing the 3rd replica over on the new rack. Only after the new rack is full enough, switch on the regionservers in the new rack (The alternative is switch on regionservers and cripple the top-of-the-rack switch with traffic as regionservers read remotely). HDFS-2576 is an attempt at adding some of this 'Favored Node' facility back into apache hdfs (though we've not made use of it over in apache hbaselandia up to this – we've still to 'finish' up our end).

          If your cluster is hbase-centric, then making the balancer exclude all hbase files has marginal value.

          Well, currently the balancer may wreak havoc if it runs across the hbase root.dir.

          I think what we really need is to support pinning the local replica but allow the other remote replicas to float.

          Another aspect of 'Favored Nodes' (implemented by FB, lagging in apache hbase) was that on node crash, since we'd been 'placing' the blocks, we'd know which machine(s) could serve the data that was on the dead machine while maintainng a better than random locality quotient. We'd not be able to do this if the non-locals could float.

          Not suggesting that the above is the best way to go; just listing out a couple of hbase concerns.

          Show
          stack added a comment - I believe the goal is to keep the blocks of an hbase column on the region server? Yes. Locality is important: e.g. being able to SCR makes a big difference in our serving latencies (and how much CPU we end up using servicing requests). ...it's effectively up to hbase to re-balance the cluster.... Facebook describe their HBase doing this. Add a rack and slow fill by placing the 3rd replica over on the new rack. Only after the new rack is full enough, switch on the regionservers in the new rack (The alternative is switch on regionservers and cripple the top-of-the-rack switch with traffic as regionservers read remotely). HDFS-2576 is an attempt at adding some of this 'Favored Node' facility back into apache hdfs (though we've not made use of it over in apache hbaselandia up to this – we've still to 'finish' up our end). If your cluster is hbase-centric, then making the balancer exclude all hbase files has marginal value. Well, currently the balancer may wreak havoc if it runs across the hbase root.dir. I think what we really need is to support pinning the local replica but allow the other remote replicas to float. Another aspect of 'Favored Nodes' (implemented by FB, lagging in apache hbase) was that on node crash, since we'd been 'placing' the blocks, we'd know which machine(s) could serve the data that was on the dead machine while maintainng a better than random locality quotient. We'd not be able to do this if the non-locals could float. Not suggesting that the above is the best way to go; just listing out a couple of hbase concerns.
          Hide
          Jean-Marc Spaggiari added a comment -

          I don't know if any other application outside of HBase need to avoid block movement, but the pinning option you proposed might be fine for HBase. I will just recommend to have the option to pin all the blocks and not just the first one. The thing is, HBase might a some point in the future define "ghosts" regions on the replicates to improve performances or faster MTTR. If we move those blocks that will again create same kind of concern.

          So adding the ability to pin all the blocks or just one or none when creating them might be a good option too for HBase.

          Show
          Jean-Marc Spaggiari added a comment - I don't know if any other application outside of HBase need to avoid block movement, but the pinning option you proposed might be fine for HBase. I will just recommend to have the option to pin all the blocks and not just the first one. The thing is, HBase might a some point in the future define "ghosts" regions on the replicates to improve performances or faster MTTR. If we move those blocks that will again create same kind of concern. So adding the ability to pin all the blocks or just one or none when creating them might be a good option too for HBase.
          Hide
          Daryn Sharp added a comment -

          A main concern we have is the performance/scalability of the proposed patch. Looking up paths for every block is not trivial. The namespace lock is going to be held for a much longer duration leading to performance degradation. GC pressure will also increase from all the temporary arrays, byte to string conversions, and stringbuilder instances need to resolve the path.

          Another concern is this design approach. The BM shouldn't be path aware. We are trying to scale the performance and size of the namespace with a long term goal of the BM being an independent service. As such, the BM cannot have a direct reference to the namesystem, which means it cannot be path aware. Not to mention it violates the path -> inode -> block layer abstractions if a block is path aware.

          I understand and share your desire is to prevent balancing from destroying hbase performance. I believe the goal is to keep the blocks of an hbase column on the region server? Pinning based on path pins all replicas, so when you increase your cluster capacity, it's effectively up to hbase to re-balance the cluster by recreating files. If your cluster is hbase-centric, then making the balancer exclude all hbase files has marginal value.

          I think what we really need is to support pinning the local replica but allow the other remote replicas to float.
          Perhaps a new flag to create/append could be used to instruct the data streamer to pin the local replica. The flag is part of the block's metadata (whether in the file or elsewhere). The DN may then refuse balancer requests to replace the block. If done correctly, it's completely backwards compatible and requires no NN changes.

          Show
          Daryn Sharp added a comment - A main concern we have is the performance/scalability of the proposed patch. Looking up paths for every block is not trivial. The namespace lock is going to be held for a much longer duration leading to performance degradation. GC pressure will also increase from all the temporary arrays, byte to string conversions, and stringbuilder instances need to resolve the path. Another concern is this design approach. The BM shouldn't be path aware. We are trying to scale the performance and size of the namespace with a long term goal of the BM being an independent service. As such, the BM cannot have a direct reference to the namesystem, which means it cannot be path aware. Not to mention it violates the path -> inode -> block layer abstractions if a block is path aware. – I understand and share your desire is to prevent balancing from destroying hbase performance. I believe the goal is to keep the blocks of an hbase column on the region server? Pinning based on path pins all replicas, so when you increase your cluster capacity, it's effectively up to hbase to re-balance the cluster by recreating files. If your cluster is hbase-centric, then making the balancer exclude all hbase files has marginal value. I think what we really need is to support pinning the local replica but allow the other remote replicas to float. Perhaps a new flag to create/append could be used to instruct the data streamer to pin the local replica. The flag is part of the block's metadata (whether in the file or elsewhere). The DN may then refuse balancer requests to replace the block. If done correctly, it's completely backwards compatible and requires no NN changes.
          Hide
          Yongjun Zhang added a comment -

          I didn't see Daryn Sharp's comments when I posted my last one, I certainly would like to understand the concerns. Thanks Daryn.

          Show
          Yongjun Zhang added a comment - I didn't see Daryn Sharp 's comments when I posted my last one, I certainly would like to understand the concerns. Thanks Daryn.
          Hide
          Daryn Sharp added a comment -

          I understand the value of this feature, but I'd like to place a polite -1 pending further discussion. I'm not opposed to the goal of the feature but I'd like to make sure it's not committed before more evaluation. I'll be posting joint concerns from an internal brainstorming after this comment.

          Show
          Daryn Sharp added a comment - I understand the value of this feature, but I'd like to place a polite -1 pending further discussion. I'm not opposed to the goal of the feature but I'd like to make sure it's not committed before more evaluation. I'll be posting joint concerns from an internal brainstorming after this comment.
          Hide
          Yongjun Zhang added a comment -

          Thanks zhaoyunjiong for addressing my earlier comments. Vinayakumar B, thanks for your additional comments, they very good ones to me.

          Yunjiong, when you address Vinayakumar's comments, would you please consider the following:

          • Balancer.java, the variable excludeDir really meant "excludePaths", which contains a list of paths separated by comma, each of the paths could be either a dir and a file. So suggest to replace "excludeDir" with "excludePaths".
          • Line 422 of NamenodeRpcServer.java exceeded 80 chars.
          • NamenodeProtocal.java, I'm suggesting to improve the following comments a bit more
              *          exclude blocks belongs to files with specified path prefix, use
              *          comma to separate if have multiple paths
            

            TO:

               *          exclude blocks belonging to files with path prefix in excludePaths, which
               *          is a single path or a list of paths separated by comma. 
            
          • I happen to see a typo in test code:
              r = Balancer.run(namenodes, Balancer.Parameters.DEFALUT, conf);
            

            The DEFALUT meant DEFAULT, would you please correct it as well?

          I'm giving my +1 once you address Vinayakumar's comments and mine above.

          Thanks again for the good work.

          Show
          Yongjun Zhang added a comment - Thanks zhaoyunjiong for addressing my earlier comments. Vinayakumar B , thanks for your additional comments, they very good ones to me. Yunjiong, when you address Vinayakumar's comments, would you please consider the following: Balancer.java, the variable excludeDir really meant "excludePaths", which contains a list of paths separated by comma, each of the paths could be either a dir and a file. So suggest to replace "excludeDir" with "excludePaths". Line 422 of NamenodeRpcServer.java exceeded 80 chars. NamenodeProtocal.java, I'm suggesting to improve the following comments a bit more * exclude blocks belongs to files with specified path prefix, use * comma to separate if have multiple paths TO: * exclude blocks belonging to files with path prefix in excludePaths, which * is a single path or a list of paths separated by comma. I happen to see a typo in test code: r = Balancer.run(namenodes, Balancer.Parameters.DEFALUT, conf); The DEFALUT meant DEFAULT, would you please correct it as well? I'm giving my +1 once you address Vinayakumar's comments and mine above. Thanks again for the good work.
          Hide
          Daryn Sharp added a comment -

          Please give me a little time to think about this patch. I think it will be detrimental to decoupling the BM from the FSN.

          Show
          Daryn Sharp added a comment - Please give me a little time to think about this patch. I think it will be detrimental to decoupling the BM from the FSN.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12641679/HDFS-6133.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6714//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6714//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641679/HDFS-6133.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6714//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6714//console This message is automatically generated.
          Hide
          Vinayakumar B added a comment -

          Thanks zhaoyunjiong for the patch.
          Approach looks good,

          I have following comments.
          1.

          +        .setDatanode(PBHelper.convert((DatanodeID) datanode)).setSize(size)
          +        .setPathPrefix(pathPrefix).build();

          Since pathPrefix is optional, you need not set an empty string when its null. You need not set at all.

          Similarly, you need to check for request.hasPathPrefix before using request.getPathPrefix(). if its not set then you can pass null from NamenodeProtocolServerSideTranslatorPB.java itself. This is the usual convention followed for optional parameters.

          2. pathPrefix can be renamed to appropriate one, such as "excludePathPrefix" in all places.
          3. in Test, why need to extract to one more method testBalancerWithExcludeDirectory ?. i think its not required unless its re-used.

          NamenodeProtocol.proto
          My understanding is that the newly added optional parameter should have no impact in compatibility. But other more experienced reviewers please confirm.

          Yes you are correct, it provides compatibility. But since NameNodeProtocol is only a server protocol, not used by any client application, I feel no need to worry about it.

          Show
          Vinayakumar B added a comment - Thanks zhaoyunjiong for the patch. Approach looks good, I have following comments. 1. + .setDatanode(PBHelper.convert((DatanodeID) datanode)).setSize(size) + .setPathPrefix(pathPrefix).build(); Since pathPrefix is optional, you need not set an empty string when its null. You need not set at all. Similarly, you need to check for request.hasPathPrefix before using request.getPathPrefix() . if its not set then you can pass null from NamenodeProtocolServerSideTranslatorPB.java itself. This is the usual convention followed for optional parameters. 2. pathPrefix can be renamed to appropriate one, such as "excludePathPrefix" in all places. 3. in Test, why need to extract to one more method testBalancerWithExcludeDirectory ?. i think its not required unless its re-used. NamenodeProtocol.proto My understanding is that the newly added optional parameter should have no impact in compatibility. But other more experienced reviewers please confirm. Yes you are correct, it provides compatibility. But since NameNodeProtocol is only a server protocol, not used by any client application, I feel no need to worry about it.
          Hide
          zhaoyunjiong added a comment -

          Thanks Yongjun Zhang and Benoy Antony for the review.

          Update patches according to the comments except the corner case about "/a/b" covers "/a/b/c", I do believe user won't do that.

          Show
          zhaoyunjiong added a comment - Thanks Yongjun Zhang and Benoy Antony for the review. Update patches according to the comments except the corner case about "/a/b" covers "/a/b/c", I do believe user won't do that.
          Hide
          Benoy Antony added a comment -

          zhaoyunjiong

          You can use StringUtils.getTrimmedStringCollection() to remove empty, duplicate elements as the preprocessing step.

          Show
          Benoy Antony added a comment - zhaoyunjiong You can use StringUtils.getTrimmedStringCollection() to remove empty, duplicate elements as the preprocessing step.
          Hide
          Yongjun Zhang added a comment -

          Hi zhaoyunjiong, thanks for letting me know about HDFS-6133 at HDFS-4420, and sorry for getting back late.

          I looked at the patch you did, it looks pretty good to me. Thanks for the good work.

          I have some comments on small things:

          • Some lines exceed 80 chars. E.g. NamenodeProtocolServerSideTranslatorPB.java, NameNodeRpcServer.java etc.
          • In BlockManager.getBlocksWithLocations
            You might consider stripping out empty prefixes in the pathPrefixes array as a preprocessing step, and then in the loop you don't need to check prefix.isEmpty(). When you have a lot of blocks to iterate in the loop, it would save some run time.

          Another corner case is, when there are multiple prexies, one prefix might cover another (e.g., /a/b covers /a/b/c). Preprocessing the pathPrefixes to remove the prefixes covered by other can be done, but user should be aware of that /a/b and /a/b/c are duplicates. So this preprocessing is optional, depending on what the use cases look like.

          • In NamenodeProtocol.java, description of newly added parameter pathPrefix is missing.
          • NamenodeProtocol.proto, Nit thing, you might consider modify the comments
            // Exclude blocks belongs to files which their path have prefix: pathPrefix
            To something like
            // Exclude blocks belonging to files with specified path prefix
          • NamenodeProtocol.proto
            My understanding is that the newly added optional parameter should have no impact in compatibility. But other more experienced reviewers please confirm.
          • testBalancerWithExcludeDirectory
            It might help to add another call at the end of the test to do one more round of balancing without the excludeDir parameter, and assert that blocks are moved.

          Thanks.

          Show
          Yongjun Zhang added a comment - Hi zhaoyunjiong , thanks for letting me know about HDFS-6133 at HDFS-4420 , and sorry for getting back late. I looked at the patch you did, it looks pretty good to me. Thanks for the good work. I have some comments on small things: Some lines exceed 80 chars. E.g. NamenodeProtocolServerSideTranslatorPB.java, NameNodeRpcServer.java etc. In BlockManager.getBlocksWithLocations You might consider stripping out empty prefixes in the pathPrefixes array as a preprocessing step, and then in the loop you don't need to check prefix.isEmpty(). When you have a lot of blocks to iterate in the loop, it would save some run time. Another corner case is, when there are multiple prexies, one prefix might cover another (e.g., /a/b covers /a/b/c). Preprocessing the pathPrefixes to remove the prefixes covered by other can be done, but user should be aware of that /a/b and /a/b/c are duplicates. So this preprocessing is optional, depending on what the use cases look like. In NamenodeProtocol.java, description of newly added parameter pathPrefix is missing. NamenodeProtocol.proto, Nit thing, you might consider modify the comments // Exclude blocks belongs to files which their path have prefix: pathPrefix To something like // Exclude blocks belonging to files with specified path prefix NamenodeProtocol.proto My understanding is that the newly added optional parameter should have no impact in compatibility. But other more experienced reviewers please confirm. testBalancerWithExcludeDirectory It might help to add another call at the end of the test to do one more round of balancing without the excludeDir parameter, and assert that blocks are moved. Thanks.
          Hide
          zhaoyunjiong added a comment -

          This patch support exclude multiple paths.

          Show
          zhaoyunjiong added a comment - This patch support exclude multiple paths.
          Hide
          zhaoyunjiong added a comment -

          Thanks for your review, stack.

          I didn't aware HDFS-4420 when I created this issue.
          The problem we are trying to solve is same, but with very different approach.
          The performance for HDFS-4420 seems not very well when exclude path have huge blocks.

          I just use hbase for example, and also hbase is main use case for this feature.
          For now it only accept one exclude path, but support multiple is a good idea, I can upload a new patch next week.

          It only run manually.

          Show
          zhaoyunjiong added a comment - Thanks for your review, stack. I didn't aware HDFS-4420 when I created this issue. The problem we are trying to solve is same, but with very different approach. The performance for HDFS-4420 seems not very well when exclude path have huge blocks. I just use hbase for example, and also hbase is main use case for this feature. For now it only accept one exclude path, but support multiple is a good idea, I can upload a new patch next week. It only run manually.
          Hide
          stack added a comment -

          HDFS-4420 looks related?

          I skimmed the patch. Looks good. You should probably change the issue subject to exclude specified paths from consideration rather than talk of hbase (your patch is of use to more than just hbase installs). It looks like I can pass multiple -excludes when I run the balancer? Is the balancer run automatically by HDFS on a period or is it only run manually?

          Show
          stack added a comment - HDFS-4420 looks related? I skimmed the patch. Looks good. You should probably change the issue subject to exclude specified paths from consideration rather than talk of hbase (your patch is of use to more than just hbase installs). It looks like I can pass multiple -excludes when I run the balancer? Is the balancer run automatically by HDFS on a period or is it only run manually?
          Hide
          zhaoyunjiong added a comment -

          This patch make Balancer support don't move blocks belongs to Hbase

          Show
          zhaoyunjiong added a comment - This patch make Balancer support don't move blocks belongs to Hbase

            People

            • Assignee:
              zhaoyunjiong
              Reporter:
              zhaoyunjiong
            • Votes:
              1 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development