Issue Details (XML | Word | Printable)

Key: HADOOP-4483
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Ahad Rana
Reporter: Ahad Rana
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Created: 21/Oct/08 08:38 PM   Updated: 08/Jul/09 04:43 PM
Return to search
Component/s: None
Affects Version/s: 0.18.1
Fix Version/s: 0.18.2

Time Tracking:
Original Estimate: 1h
Original Estimate - 1h
Remaining Estimate: 1h
Remaining Estimate - 1h
Time Spent: Not Specified
Remaining Estimate - 1h

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-4483-v2.patch 2008-10-22 01:02 AM Ahad Rana 2 kB
Text File Licensed for inclusion in ASF works HADOOP-4483-v3.patch 2008-10-22 06:03 PM dhruba borthakur 2 kB
Text File Licensed for inclusion in ASF works HADOOP-4483-v3.patch 2008-10-22 06:01 PM dhruba borthakur 6 kB
Text File Licensed for inclusion in ASF works invalidateBlocksCopy-br18.patch 2008-10-29 09:43 PM Hairong Kuang 4 kB
Text File Licensed for inclusion in ASF works invalidateBlocksCopy.patch 2008-10-29 06:54 PM Hairong Kuang 4 kB
File Licensed for inclusion in ASF works patch.HADOOP-4483 2008-10-21 08:43 PM Ahad Rana 1 kB
Environment: hadoop-0.18.1 running on a cluster of 16 nodes.

Hadoop Flags: Reviewed
Resolution Date: 29/Oct/08 09:57 PM


 Description  « Hide
The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats).

In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Ahad Rana added a comment - 21/Oct/08 08:43 PM
This fixes the getBlockArray method in DatanodeDescriptor to constrained the returned Block array to the maxBlocks values passed in.

Tsz Wo (Nicholas), SZE added a comment - 21/Oct/08 09:43 PM
Good catch on the bug.
  • Removing the elements form a collection one by one could be expensive. Also, we have max >= n in most of the cases. How about using the existing codes (i.e. blocks.clear() instead of e.remove()) when max >= n?
  • Could you also remove the white space changes, tabs and the trailing spaces? This will keep the codes have the same style.

dhruba borthakur added a comment - 21/Oct/08 09:47 PM
Good catch. This is a candidate for 0.19.1

Ahad Rana added a comment - 21/Oct/08 10:08 PM
Re: Removing the elements form a collection one by one could be expensive. Also, we have max >= n in most of the cases. How about using the existing codes (i.e. blocks.clear() instead of e.remove()) when max >= n?

I believe that since the underlying container is a tree, even Collection's internal code needs to use the iterator approach to remove items from the data structure. Plus, in the standard configuration, where the heartbeat interval is set to 3 seconds, I believe max blocks <= 100. Better to stick to one code path in this instance.

Re: Could you also remove the white space changes, tabs and the trailing spaces? This will keep the codes have the same style.

Sorry, one my editors must to be set to tabs vs. spaces or vice versa. Am I correct in assuming that the convention for the hadoop codebase is spaces (instead of tabs) ?


Tsz Wo (Nicholas), SZE added a comment - 21/Oct/08 10:32 PM

I believe that since the underlying container is a tree, even Collection's internal code needs to use the iterator approach to remove items from the data structure.

That is wrong. clear() removes all elements. You only have to set root = null,

Plus, in the standard configuration, where the heartbeat interval is set to 3 seconds, I believe max blocks <= 100.

That's exactly we are removing the whole tree most of the times.

> Am I correct in assuming that the convention for the hadoop codebase is spaces (instead of tabs) ?

Yes, we are not using tabs in the source codes.


Hairong Kuang added a comment - 21/Oct/08 10:44 PM
If we want to make the implementation to be more efficient, I have the following suggestions:
1. Store invalidateBlocks as an array or arrayList instead of a treeSet;
2. ReplicationMonitor makes sure that the size of invalidateBlocks does not go beyond blockInvalidateLimit;
3. getBlockArray does not need to worry about the number of invalidate blocks. It needs only to do an array copy and reset.

Ahad Rana added a comment - 22/Oct/08 12:13 AM
Re: That is wrong. clear() removes all elements. You only have to set root = null,

You are right. I see that TreeSet uses TreeMap as the underlying container, and, each iterator.remove causes a re-balance of the red-black tree. And, yes, looks like TreeSet.clear() set root = null, which is obviously speedier. I would still argue that under normal load scenarios (from my initial observations), the number of blocks in the Collection are <= 10, and since delete on the CLRS implementation of the Red-Black Tree takes O(log n), there might not be that much gained from the .clear() optimization. In the edge case where the system is under heavy load, I observed block count in excess of 1000. In this case, a partial removal of the blocks (based on the max blocks limitation) would still require the iterator.remove pattern. So perhaps in the long term, it might be better to replace the underlying data structure, as Hairong suggests. I guess it would be interesting to find out if the code (post patch) is a performance bottle neck or not before undertaking the more aggressive modifications.


Ahad Rana added a comment - 22/Oct/08 01:02 AM
Revised version with .clear() implementation in cases where n == Blocks.size()

dhruba borthakur added a comment - 22/Oct/08 05:48 AM
@Nicholas/Hairong: Are you folks saying that the approach adopted by this patch is not sufficient and it needs more changes to make it efficient?

Hairong Kuang added a comment - 22/Oct/08 05:40 PM - edited
When I looked at the code related to block invalidation, I had a question whey invalidateBlocks were implemented as TreeSet, which require log( n ) for both Inserting & removing. If we limit the size of invalidateBlocks to be no greater than blockInvalidateLimit, an array or arrayList would be more efficient. Otherwise, a LinkedList would be still be better than a TreeSet.

Tsz Wo (Nicholas), SZE added a comment - 22/Oct/08 05:41 PM
+1 HADOOP-4483-v2.patch looks good to me.

>Are you folks saying that the approach adopted by this patch is not sufficient and it needs more changes to make it efficient?

Current fix is good enough for this issue. If there is anything we could do for better performance, we could do it in a separated issue.


Hairong Kuang added a comment - 22/Oct/08 05:55 PM
>If there is anything we could do for better performance, we could do it in a separated issue.
I agree. Even Nicholas's suggestion is not necessary for this issue. Because this makes the good case even better and worse case even worse. Besides, the new patch covers the cases (n<available) & (n==availiable); If thee is a programmatic error that causes (n>available), invalidateBlocks may not get cleared.

dhruba borthakur added a comment - 22/Oct/08 06:01 PM
Thanks Ahad for the patch. Changed patch to delete empty lines and to generate it using "svn diff" from the top level of the source tree.

dhruba borthakur added a comment - 22/Oct/08 06:03 PM
Removed more empty lines from earlier patch.

Nigel Daley added a comment - 29/Oct/08 01:01 AM
Please include a unit test.

Hairong Kuang added a comment - 29/Oct/08 06:54 PM
Upload a patch with a junit test.

Tsz Wo (Nicholas), SZE added a comment - 29/Oct/08 06:59 PM
+1 patch looks good.

Ahad Rana added a comment - 29/Oct/08 07:03 PM
Thanks Hairong. Sorry, I couldn't get to this in time.

Hairong Kuang added a comment - 29/Oct/08 08:57 PM
Junit tests passed on my local machine:
BUILD SUCCESSFUL
Total time: 113 minutes 11 seconds

Ant test-patch result:

[exec] +1 overall.

[exec] +1 @author. The patch does not contain any @author tags.

[exec] +1 tests included. The patch appears to include 3 new or modified tests.

[exec] +1 javadoc. The javadoc tool did not generate any warning messages.

[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.

[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.

[exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.


Tsz Wo (Nicholas), SZE added a comment - 29/Oct/08 09:10 PM
Hairong, we also need a 0.18 patch.

Hairong Kuang added a comment - 29/Oct/08 09:43 PM
Here is the 18 patch.

Tsz Wo (Nicholas), SZE added a comment - 29/Oct/08 09:57 PM
I just committed this. Thanks, Ahad Rana and Hairong Kuang!

Hudson added a comment - 30/Oct/08 04:10 PM
Integrated in Hadoop-trunk #647 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/647/)
. Honor the max parameter in DatanodeDescriptor.getBlockArray(...). (Ahad Rana and Hairong Kuang via szetszwo)