Hadoop Common
  1. Hadoop Common
  2. HADOOP-4483

getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.18.1
    • Fix Version/s: 0.18.2
    • Component/s: None
    • Labels:
      None
    • Environment:

      hadoop-0.18.1 running on a cluster of 16 nodes.

    • Hadoop Flags:
      Reviewed

      Description

      The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats).

      In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release.

      1. patch.HADOOP-4483
        1 kB
        Ahad Rana
      2. HADOOP-4483-v2.patch
        2 kB
        Ahad Rana
      3. HADOOP-4483-v3.patch
        6 kB
        dhruba borthakur
      4. HADOOP-4483-v3.patch
        2 kB
        dhruba borthakur
      5. invalidateBlocksCopy.patch
        4 kB
        Hairong Kuang
      6. invalidateBlocksCopy-br18.patch
        4 kB
        Hairong Kuang

        Activity

        Hide
        Ahad Rana added a comment -

        This fixes the getBlockArray method in DatanodeDescriptor to constrained the returned Block array to the maxBlocks values passed in.

        Show
        Ahad Rana added a comment - This fixes the getBlockArray method in DatanodeDescriptor to constrained the returned Block array to the maxBlocks values passed in.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Good catch on the bug.

        • Removing the elements form a collection one by one could be expensive. Also, we have max >= n in most of the cases. How about using the existing codes (i.e. blocks.clear() instead of e.remove()) when max >= n?
        • Could you also remove the white space changes, tabs and the trailing spaces? This will keep the codes have the same style.
        Show
        Tsz Wo Nicholas Sze added a comment - Good catch on the bug. Removing the elements form a collection one by one could be expensive. Also, we have max >= n in most of the cases. How about using the existing codes (i.e. blocks.clear() instead of e.remove()) when max >= n? Could you also remove the white space changes, tabs and the trailing spaces? This will keep the codes have the same style.
        Hide
        dhruba borthakur added a comment -

        Good catch. This is a candidate for 0.19.1

        Show
        dhruba borthakur added a comment - Good catch. This is a candidate for 0.19.1
        Hide
        Ahad Rana added a comment -

        Re: Removing the elements form a collection one by one could be expensive. Also, we have max >= n in most of the cases. How about using the existing codes (i.e. blocks.clear() instead of e.remove()) when max >= n?

        I believe that since the underlying container is a tree, even Collection's internal code needs to use the iterator approach to remove items from the data structure. Plus, in the standard configuration, where the heartbeat interval is set to 3 seconds, I believe max blocks <= 100. Better to stick to one code path in this instance.

        Re: Could you also remove the white space changes, tabs and the trailing spaces? This will keep the codes have the same style.

        Sorry, one my editors must to be set to tabs vs. spaces or vice versa. Am I correct in assuming that the convention for the hadoop codebase is spaces (instead of tabs) ?

        Show
        Ahad Rana added a comment - Re: Removing the elements form a collection one by one could be expensive. Also, we have max >= n in most of the cases. How about using the existing codes (i.e. blocks.clear() instead of e.remove()) when max >= n? I believe that since the underlying container is a tree, even Collection's internal code needs to use the iterator approach to remove items from the data structure. Plus, in the standard configuration, where the heartbeat interval is set to 3 seconds, I believe max blocks <= 100. Better to stick to one code path in this instance. Re: Could you also remove the white space changes, tabs and the trailing spaces? This will keep the codes have the same style. Sorry, one my editors must to be set to tabs vs. spaces or vice versa. Am I correct in assuming that the convention for the hadoop codebase is spaces (instead of tabs) ?
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I believe that since the underlying container is a tree, even Collection's internal code needs to use the iterator approach to remove items from the data structure.

        That is wrong. clear() removes all elements. You only have to set root = null,

        Plus, in the standard configuration, where the heartbeat interval is set to 3 seconds, I believe max blocks <= 100.

        That's exactly we are removing the whole tree most of the times.

        > Am I correct in assuming that the convention for the hadoop codebase is spaces (instead of tabs) ?

        Yes, we are not using tabs in the source codes.

        Show
        Tsz Wo Nicholas Sze added a comment - I believe that since the underlying container is a tree, even Collection's internal code needs to use the iterator approach to remove items from the data structure. That is wrong. clear() removes all elements. You only have to set root = null, Plus, in the standard configuration, where the heartbeat interval is set to 3 seconds, I believe max blocks <= 100. That's exactly we are removing the whole tree most of the times. > Am I correct in assuming that the convention for the hadoop codebase is spaces (instead of tabs) ? Yes, we are not using tabs in the source codes.
        Hide
        Hairong Kuang added a comment -

        If we want to make the implementation to be more efficient, I have the following suggestions:
        1. Store invalidateBlocks as an array or arrayList instead of a treeSet;
        2. ReplicationMonitor makes sure that the size of invalidateBlocks does not go beyond blockInvalidateLimit;
        3. getBlockArray does not need to worry about the number of invalidate blocks. It needs only to do an array copy and reset.

        Show
        Hairong Kuang added a comment - If we want to make the implementation to be more efficient, I have the following suggestions: 1. Store invalidateBlocks as an array or arrayList instead of a treeSet; 2. ReplicationMonitor makes sure that the size of invalidateBlocks does not go beyond blockInvalidateLimit; 3. getBlockArray does not need to worry about the number of invalidate blocks. It needs only to do an array copy and reset.
        Hide
        Ahad Rana added a comment -

        Re: That is wrong. clear() removes all elements. You only have to set root = null,

        You are right. I see that TreeSet uses TreeMap as the underlying container, and, each iterator.remove causes a re-balance of the red-black tree. And, yes, looks like TreeSet.clear() set root = null, which is obviously speedier. I would still argue that under normal load scenarios (from my initial observations), the number of blocks in the Collection are <= 10, and since delete on the CLRS implementation of the Red-Black Tree takes O(log n), there might not be that much gained from the .clear() optimization. In the edge case where the system is under heavy load, I observed block count in excess of 1000. In this case, a partial removal of the blocks (based on the max blocks limitation) would still require the iterator.remove pattern. So perhaps in the long term, it might be better to replace the underlying data structure, as Hairong suggests. I guess it would be interesting to find out if the code (post patch) is a performance bottle neck or not before undertaking the more aggressive modifications.

        Show
        Ahad Rana added a comment - Re: That is wrong. clear() removes all elements. You only have to set root = null, You are right. I see that TreeSet uses TreeMap as the underlying container, and, each iterator.remove causes a re-balance of the red-black tree. And, yes, looks like TreeSet.clear() set root = null, which is obviously speedier. I would still argue that under normal load scenarios (from my initial observations), the number of blocks in the Collection are <= 10, and since delete on the CLRS implementation of the Red-Black Tree takes O(log n), there might not be that much gained from the .clear() optimization. In the edge case where the system is under heavy load, I observed block count in excess of 1000. In this case, a partial removal of the blocks (based on the max blocks limitation) would still require the iterator.remove pattern. So perhaps in the long term, it might be better to replace the underlying data structure, as Hairong suggests. I guess it would be interesting to find out if the code (post patch) is a performance bottle neck or not before undertaking the more aggressive modifications.
        Hide
        Ahad Rana added a comment -

        Revised version with .clear() implementation in cases where n == Blocks.size()

        Show
        Ahad Rana added a comment - Revised version with .clear() implementation in cases where n == Blocks.size()
        Hide
        dhruba borthakur added a comment -

        @Nicholas/Hairong: Are you folks saying that the approach adopted by this patch is not sufficient and it needs more changes to make it efficient?

        Show
        dhruba borthakur added a comment - @Nicholas/Hairong: Are you folks saying that the approach adopted by this patch is not sufficient and it needs more changes to make it efficient?
        Hide
        Hairong Kuang added a comment - - edited

        When I looked at the code related to block invalidation, I had a question whey invalidateBlocks were implemented as TreeSet, which require log( n ) for both Inserting & removing. If we limit the size of invalidateBlocks to be no greater than blockInvalidateLimit, an array or arrayList would be more efficient. Otherwise, a LinkedList would be still be better than a TreeSet.

        Show
        Hairong Kuang added a comment - - edited When I looked at the code related to block invalidation, I had a question whey invalidateBlocks were implemented as TreeSet, which require log( n ) for both Inserting & removing. If we limit the size of invalidateBlocks to be no greater than blockInvalidateLimit, an array or arrayList would be more efficient. Otherwise, a LinkedList would be still be better than a TreeSet.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        +1 HADOOP-4483-v2.patch looks good to me.

        >Are you folks saying that the approach adopted by this patch is not sufficient and it needs more changes to make it efficient?

        Current fix is good enough for this issue. If there is anything we could do for better performance, we could do it in a separated issue.

        Show
        Tsz Wo Nicholas Sze added a comment - +1 HADOOP-4483 -v2.patch looks good to me. >Are you folks saying that the approach adopted by this patch is not sufficient and it needs more changes to make it efficient? Current fix is good enough for this issue. If there is anything we could do for better performance, we could do it in a separated issue.
        Hide
        Hairong Kuang added a comment -

        >If there is anything we could do for better performance, we could do it in a separated issue.
        I agree. Even Nicholas's suggestion is not necessary for this issue. Because this makes the good case even better and worse case even worse. Besides, the new patch covers the cases (n<available) & (n==availiable); If thee is a programmatic error that causes (n>available), invalidateBlocks may not get cleared.

        Show
        Hairong Kuang added a comment - >If there is anything we could do for better performance, we could do it in a separated issue. I agree. Even Nicholas's suggestion is not necessary for this issue. Because this makes the good case even better and worse case even worse. Besides, the new patch covers the cases (n<available) & (n==availiable); If thee is a programmatic error that causes (n>available), invalidateBlocks may not get cleared.
        Hide
        dhruba borthakur added a comment -

        Thanks Ahad for the patch. Changed patch to delete empty lines and to generate it using "svn diff" from the top level of the source tree.

        Show
        dhruba borthakur added a comment - Thanks Ahad for the patch. Changed patch to delete empty lines and to generate it using "svn diff" from the top level of the source tree.
        Hide
        dhruba borthakur added a comment -

        Removed more empty lines from earlier patch.

        Show
        dhruba borthakur added a comment - Removed more empty lines from earlier patch.
        Hide
        Nigel Daley added a comment -

        Please include a unit test.

        Show
        Nigel Daley added a comment - Please include a unit test.
        Hide
        Hairong Kuang added a comment -

        Upload a patch with a junit test.

        Show
        Hairong Kuang added a comment - Upload a patch with a junit test.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        +1 patch looks good.

        Show
        Tsz Wo Nicholas Sze added a comment - +1 patch looks good.
        Hide
        Ahad Rana added a comment -

        Thanks Hairong. Sorry, I couldn't get to this in time.

        Show
        Ahad Rana added a comment - Thanks Hairong. Sorry, I couldn't get to this in time.
        Hide
        Hairong Kuang added a comment -

        Junit tests passed on my local machine:
        BUILD SUCCESSFUL
        Total time: 113 minutes 11 seconds

        Ant test-patch result:

        [exec] +1 overall.

        [exec] +1 @author. The patch does not contain any @author tags.

        [exec] +1 tests included. The patch appears to include 3 new or modified tests.

        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.

        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.

        [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        Show
        Hairong Kuang added a comment - Junit tests passed on my local machine: BUILD SUCCESSFUL Total time: 113 minutes 11 seconds Ant test-patch result: [exec] +1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        Hairong, we also need a 0.18 patch.

        Show
        Tsz Wo Nicholas Sze added a comment - Hairong, we also need a 0.18 patch.
        Hide
        Hairong Kuang added a comment -

        Here is the 18 patch.

        Show
        Hairong Kuang added a comment - Here is the 18 patch.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        I just committed this. Thanks, Ahad Rana and Hairong Kuang!

        Show
        Tsz Wo Nicholas Sze added a comment - I just committed this. Thanks, Ahad Rana and Hairong Kuang!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk #647 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/647/)
        . Honor the max parameter in DatanodeDescriptor.getBlockArray(...). (Ahad Rana and Hairong Kuang via szetszwo)

        Show
        Hudson added a comment - Integrated in Hadoop-trunk #647 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/647/ ) . Honor the max parameter in DatanodeDescriptor.getBlockArray(...). (Ahad Rana and Hairong Kuang via szetszwo)

          People

          • Assignee:
            Ahad Rana
            Reporter:
            Ahad Rana
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1h
              1h
              Remaining:
              Remaining Estimate - 1h
              1h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development