Details
Description
Block reports are encoded as a PB repeating long. Repeating fields use an ArrayList with default capacity of 10. A block report containing tens or hundreds of thousand of longs (3 for each replica) is extremely expensive since the ArrayList must realloc many times. Also, decoding repeating fields will box the primitive longs which must then be unboxed.
Attachments
Attachments
- HDFS-7435.000.patch
- 59 kB
- Jing Zhao
- HDFS-7435.001.patch
- 59 kB
- Jing Zhao
- HDFS-7435.002.patch
- 63 kB
- Jing Zhao
- HDFS-7435.patch
- 75 kB
- Daryn Sharp
- HDFS-7435.patch
- 75 kB
- Daryn Sharp
- HDFS-7435.patch
- 74 kB
- Daryn Sharp
- HDFS-7435.patch
- 75 kB
- Daryn Sharp
- HDFS-7435.patch
- 67 kB
- Daryn Sharp
- HDFS-7435.patch
- 67 kB
- Daryn Sharp
- HDFS-7435.patch
- 64 kB
- Daryn Sharp
- HDFS-7435.patch
- 54 kB
- Daryn Sharp
- HDFS-7435.patch
- 16 kB
- Daryn Sharp
Issue Links
- breaks
-
HDFS-9484 NNThroughputBenchmark$BlockReportStats should not send empty block reports
- Resolved
-
HDFS-10569 A bug causes OutOfIndex error in BlockListAsLongs
- Resolved
- contains
-
HDFS-7906 BlockReport of a RUR can have a FINALIZED original replica from Truncate
- Closed
- is caused by
-
HDFS-16284 Unboxing long value in old version's block report cause full GC
- Open
- is related to
-
HADOOP-11339 Reuse buffer for Hadoop RPC
- Open
- relates to
-
HDFS-9221 HdfsServerConstants#ReplicaState#getState should avoid calling values() since it creates a temporary array
- Closed
Activity
Can we also use some blocked arrays (and each block with a size limit) instead of a contiguous array for the full block report? In practice we've seen big full block reports cause NameNode gc. This can be similar idea with HDFS-4879.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12683375/HDFS-7435.patch
against trunk revision 555fa2d.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8821//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8821//console
This message is automatically generated.
We've also seen large block reports aggravate GC, not due to decoded size, but due to the spew of garbage from the re-allocs and (un)boxing. This patch is basically taking us back to pre-PB performance & behavior.
The block report is ~44,000 blocks/MB which at our scale is a few megs per report. HDFS-4879 chunked a list of blocks to delete because of 25M entries consuming ~400MB. I don't foresee blocks/node approaching that level anytime soon. In that light, it's probably premature to introduce chunking?
I've seen some cluster generating block report each containing >1000,000 blocks, i.e., 20~30MB. Considering we're handling block reports from multiple hundred DataNodes I guess this can still cause issue but I'm not sure if this is the main cause of the gc in that cluster. But I agree we can make the current optimization first and do chunking separately if necessary.
jingzhao and daryn, I think chunking the block reports into a set of arrays is a good change to avoid possible GC promotion failures due to not finding contiguous memory in old generation heap. We should consider doing that along with this change to avoid unnecessary layout change.
While I agree it's a questionably nice to have feature - see below - decoding the block report into a contiguous or non-contiguous array is an implementation detail that can be changed in another jira w/o updating the layout version.
This patch simply undoes the damage caused by moving from Writables to PBs. I changed it internally from long[] -> dynamically growing ArrayList<Long> -> PB -> dynamically growing ArrayList<Long> -> long[]; to simply fixed long[] -> PB -> fixed long[]. Further changes to segment the list in ways not previously done are feature creep.
If 20-30MB is going to cause a promotion failure in a namenode servicing a hundred millions of blocks - it's already game over. 2.x easily generates over 1GB garbage/sec at a mere ~20k ops/sec. That's an average of 54MB/request. This is why I'm submitting GC friendly changes.
While I agree it's a questionably nice to have feature
Not sure why you think it is questionably nice to have feature...
If 20-30MB is going to cause a promotion failure in a namenode servicing a hundred millions of blocks - it's already game over. 2.x easily generates over 1GB garbage/sec at a mere ~20k ops/sec.
Just so that we are on the same page, java arrays requires contiguous space in memory. In many installs, when namenode becomes unresponsive and datanodes end up sending block reports, these block reports get promoted to older generation (because namenode is processing them slowly). Since old generation may be fragmented, promotions can fail when large arrays need to be promoted.
That said, I am fine doing it in a subsequent jira. It will end up touching the same parts or replacing the code that you are adding.
We're definitely on the same page. I meant questionably in the sense that the garbage generation rate is so high that CMS (you mention fragmentation) will be slow, bloated with free lists, and not keeping up. Now granted, a 20M allocation is likely to be prematurely tenured - along with the IPC protobuf containing the large report. One solution/workaround to both is reducing the size of the individual BR via multiple storages. The storages don't have to be individual drives but just subdirs.
Segmenting shouldn't be an outright replacement. The decode will emit a long[][] which requires updating StorageBlockReport and BlockListAsLongs. Similar changes to the datanode, although not required for the namenode changes, will be more complex. I already considered segmenting. I found the complexity vs time vs benefit to be much lower than improvements to other subsystems.
Uploading a demo patch (based on Daryn's patch) for chunking. Still need to do more testing.
Hi guys, I think this is a good improvement.
I also find a similar issue related to this when I'm doing Hadoop RPC related optimization in my local branch.
As we all known that the blocks report from DNs may become very large in big cluster, and there is chance to cause full GC if there is no enough contiguous space in old generation.
We will try to reuse available connections for RPC calls, but when we process each rpc in the same connection, we will allocate a fresh heap byte buffer to store the rpc bytes data. The rpc message may be very large. So it will cause the same issue.
My thought is to reuse the data buffer in the connection, I will open a new JIRA to track it.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12683743/HDFS-7435.000.patch
against trunk revision a655973.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 10 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.server.namenode.TestStartup
org.apache.hadoop.hdfs.server.namenode.TestFileLimit
org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus
org.apache.hadoop.hdfs.TestDecommission
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
org.apache.hadoop.fs.TestSymlinkHdfsFileContext
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader
org.apache.hadoop.hdfs.TestSetrepDecreasing
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyBlockManagement
org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks
org.apache.hadoop.hdfs.TestReplication
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyIsHot
org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
org.apache.hadoop.hdfs.server.namenode.TestHostsFiles
org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
org.apache.hadoop.hdfs.server.namenode.TestINodeFile
org.apache.hadoop.hdfs.TestEncryptedTransfer
The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing
org.apache.hadoop.hdfs.server.namenode.ha.TestHAAppend
org.apache.hadoop.hdfs.server.namenode.TestFsck
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8842//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8842//console
This message is automatically generated.
I open HADOOP-11339 to reuse buffer for Hadoop RPC as stated in my above comment.
Upon quick scan, it appears the the chunk size is encoded in the buffer? If yes, I think each side should be able chose an appropriate chunk size - as an implementation detail - which is not leaked into the PB encoding. If we can agree on that point, can we move the segmentation to a separate jira?
In the current demo patch the max chunk size is not encoded in the buffer. It is currently simply determined by the DataNode based on configuration. The segmentation is done by the DataNode and NameNode does not need to know the max chunk size. For simplicity each chunk still follows the same format with the original long[] in BlockListAsLongs (i.e., it still encodes the number of finalized blocks and number of uc replicas in the first two elements).
I guess to let DataNode be the only side doing segmentation can avoid NameNode still allocating a big contiguous array before chunking. Thus I have to change the optional bytes blocksBuffer into repeated bytes blocksBuffers. Maybe we can use repeated bytes blocksBuffers here but assume the number of buffer is always 1, then move the real segmentation change into a separate jira?
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12683918/HDFS-7435.001.patch
against trunk revision 8f1454c.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 10 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.TestDFSClientRetries
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8851//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8851//console
This message is automatically generated.
Apologies if I wasn't clear. My original patch added a blocksBuffer PB field with encoding of size : long[]. The demo patch for chunking changes this new field into a repeating field. The resulting encoding is chunk-size : long[], chunk-size long[], etc.
I'm fine with chunking, but I don't think the tail should wag the dog. The in-memory representation of the blocks list is merely an implementation detail that should not influence the PB encoding. Ie. Leave blocksBuffer as a non-repeating field, and let the NN chunk the block list however it sees fit.
The in-memory representation of the blocks list is merely an implementation detail that should not influence the PB encoding. Ie. Leave blocksBuffer as a non-repeating field, and let the NN chunk the block list however it sees fit.
I think the chunked buffers should be sent all the way from DataNode to the NameNode. NameNode can store it in a different way that fits it needs (though if it is chunked, I do not see NameNode storing it differently). This is important because the problems of large contiguous array is also a problem for DataNode, since there are deployments that use 60 disks in a single node with more than 10 million blocks in a single DataNode.
This is important because the problems of large contiguous array is also a problem for DataNode, since there are deployments that use 60 disks in a single node with more than 10 million blocks in a single DataNode.
Just to be clear, the datanode and namenode require the same size of contiguous memory for the backing arrays today for the ArrayLists used internally by protobuf. So, this patch is not making the situation any worse. But I agree that chunking is a good idea and is better to be done along with this feature.
I agree that it makes sense for both the DN and NN to chunk the block list for their internal data structures. All I'm saying is that the wire protocol may not need to be built around that implementation detail - but I think a middle ground is
{ block-count, chunk-count, blocks[], chunk-count, blocks[], ... }. However I think it can be done in a simpler and less invasive fashion. I'll toss another patch up this afternoon.
On a side note: we're having some DN OOM issues too during block reports + heavy load. In your use case, I'm presuming each disk is a storage. If yes, each unencoded storage report will be ~4MB, ~1.5MB encoded. The DN heap must be >64GB, easily more for HA, so a handful of MB appears to pales in comparison to how much memory the DN wastes building the reports. Again, there are default capacity ArrayLists to build the finalized & UC lists! I'm going to make minimal changes to reduce the gross inefficiency here too.
kihwal, agree with you. We are on the same page.
daryn, some comments:
{ block-count, chunk-count, blocks[], chunk-count, blocks[], ... }All I'm saying is that the wire protocol may not need to be built around that implementation detail - but I think a middle ground is
Have you looked at Jing's patch? Can you post comments on why it cannot be enhanced to add chunk-count? I believe its close to being ready with additional changes? Is there a need to do a branch new patch?
. In your use case, I'm presuming each disk is a storage. If yes, each unencoded storage report will be ~4MB, ~1.5MB encoded. The DN heap must be >64GB, easily more for HA, so a handful of MB appears to pales in comparison to how much memory the DN wastes building the reports.
I do not follow these numbers well. BTW for very large nodes (< 200 TB), Datanodes can be run with just 8G heap space. Not sure about the > 64GB number you are mentioning.
Apologies for delay, I've been dealing with production issues, including block reports which are becoming a very big issue.
I've studied the way PB is implemented and I'm not sure fragmenting the buffer will add value. Proper encoding (mine is buggy) will not allocate a full buffer but uses a tree to hold fragments of the byte string. Decoding the PB will use the full byte array of the PB as the backing store for slices (ref to full array + offset/length). We've already paid the price for a large allocation of the full PB, carried it around in the Call object, etc, so extracting the field is essentially "free". Whether it's one or more is irrelevant.
I'm trying to performance test a patch that internally segments the BlockListAsLongs and correctly outputs the byte buffer.
Rebase the patch.
I'm trying to performance test a patch that internally segments the BlockListAsLongs and correctly outputs the byte buffer.
daryn, do you still plan to post your updated patch and performance test results? I do not know the details of your new proposal here. But sounds like you will also segment the BlockListAsLongs?
jingzhao, if you have a patch, can you please post it? In clusters with lot of small files and clusters with high density nodes, we have already seen issues related to large block reports. We can improve upon your patch in subsequent jiras.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12699754/HDFS-7435.002.patch
against trunk revision d49ae72.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 11 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.TestMiniDFSCluster
org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster
org.apache.hadoop.hdfs.TestBalancerBandwidth
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9622//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9622//console
This message is automatically generated.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12699754/HDFS-7435.002.patch
against trunk revision c0d9b93.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 11 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.web.TestWebHDFS
org.apache.hadoop.hdfs.tools.TestDFSHAAdminMiniCluster
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9625//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9625//console
This message is automatically generated.
Let me get back to this patch this afternoon. I'll post perf numbers too.
daryn, can you please comment on jingzhao's patch and why it does not address the issue? It would be great to get this done early.
Almost about to post a zero-copy block report impl via iterators so we don't need to worry about segmenting at all.
Do I have the luxury of bumping the DN's minimum NN version? That would greatly simplify the implementation. It's easy for the NN to use the presence of the protobuf fields to determine if old/new. However the prior patches illustrate it's not so easy for the DN to auto-detect.
I believe the standard upgrade procedure is upgrade the NN, then rolling upgrade the DNs. Per above, upgraded NN supports old/new reports from DNs. The only scenario in which a "problem" can occur is the cluster is fully or partially upgraded, and the NN is downgraded. The new DNs won't be able to communicate with the old NN, hence why I'd like to bump the minimum version so the DN doesn't continue to send block reports that appear to be empty to the old NN. I'd argue that if the NN is downgraded, there's going to be downtime, so you might as well rollback the DNs too.
Thoughts?
Thanks for sharing the thoughts, daryn. Why not posting your current patch first so that we can also have a better understanding about why bumping the DN's min NN version is necessary?
I don't need to bump the DN's min NN version. I leveraged the auto-detection I did before.
So here's the deal. As wasteful as PBs can be in some cases, they are smart in other cases to avoid unnecessary copying. I've leveraged those apis to their maximum. I’ve also optimized BlockListAsLongs to remove intermediate encodings.
Instead of a pre-allocated array, a ByteString.Output stream can be used by a custom specified buffer size. Internally the ByteString is building up a balanced tree of rope buffers of the given size. Extracting the buffer via Output#toByteString doesn't really copy anything. So on the DN we have cheaply built up a blocks buffer of ByteString that is internally segmented.
I relented to sending the blocks buffer in a repeating field. ByteString provides no direct API for internally accessing its roped strings. However, we can slice it up with ByteString#substring. It prefers to create bounded instances (offset+len) much like String did in earlier JDKs. Alas, a wasted instantiation. However a substring that aligns with an entire rope buffer will get the actual buffer reference – not wrapped with a bounded instance. So I slice it up that way.
On the NN, if the new repeating field is not present, it decodes as old-style format. Otherwise it uses ByteString#concat to reassemble the blocks buffer fragments. Concat doesn't copy, but again builds a balanced tree for later decoding.
The best part is the encoding/decoding. BlockListAsLongs directly encodes the Replicas into a ByteString, not into a wasted intermediate long[]. To support the unlikely and surely shorted lived case of a new DN reporting to a old NN, the PB layer will decode the ByteString and re-encode into old-style longs. Not efficient, but it’s the least ugly way to handle something that will probably never happen.
BlockListAsLongs doesn’t decode the buffer into a wasted and intermediate long[]. Instead, the iterator decodes on-demand.
I also changed the format of the encoded buffer. Might as well while the patient is on the table. It’s documented in the code, but now finalized and uc blocks aren’t treated specially. The finalized and uc counts, followed by 3-long finalized, a 3-long delimiter, and 4-long uc blocks is now just total block count, 4-long blocks. Every block encodes its ReplicaState. One compelling reason is the capability to encode additional bits into the state field. One anticipated use is encoding the pinned block status.
Phew. I need to write some new tests to verify old/new compatibility. I’ll also manually test on a perf cluster.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12700906/HDFS-7435.patch
against trunk revision caa42ad.
-1 patch. The patch command could not apply the patch.
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9668//console
This message is automatically generated.
@daryn,
This looks really good. I like the new approach and your current patch does a pre-emptive strike on several of the comments I was going to make on the .002 patch.
I really only have nits.
The patch needs to be rebased. There was one .rej when I applied it (obviously I worked past that for my review).
BlockListAsLongs.java:
BlockListAsLongs(Collection) needs an @param for the javadoc.
In #BlockListAsLongs(Collection), the ReplicaState is being written as a varint64. I realize it's a varint, but since it's really only a single byte in the implementation, it seems a little heavy handed to write it to the cos as a varint64. I also realize that it will need to be a long on the way back out for the uc long[]. If you don't want to change it from being a varint64 in the cos, then perhaps just add a comment saying that you know it's a byte (actually int) in the impl but for consistency you're using a varint64?
Since you throw UnsupportedOperationException from multiple #remove methods, you might want to add the class name to the message. e.g. "Sorry. remove not implemented for BlockReportListIterator". Along a similar vein, would it be appropriate to add a message to BlockReportReplica.getVisibleLength, getStorageUuid, and isOnTransientStorage's UnsupportedOperationException?
BlockReportTestBase.java:
getBlockReports has one line that exceeds the 80 char width.
DatanodeProtocolClientSideTranslatorPB.java:
the import of NameNodeLayoutVersion is unused.
the decl of useBlocksBuffer busts the 80 char width.
DatanodeProtocolServerSideTranslatorPB.java:
import ByteString is unused.
FsDatasetImpl.java:
in #getBlockReports, the line under case RUR busts the 80 char limit.
NameNodeLayoutVersion.java:
Perhaps s/Protobuf optimized/Optimized protobuf/
NNThroughputBenchmark.java:
Thanks for fixing the formatting in here.
TestBlockHasMultipleReplicasOnSameDN.java:
blocks.add(... busts the 80 char limit.
Thanks for the review, Charles. I think I've addressed your comments. I mildly disagree will overly verbose message for UnsupportedOperationExceptions since the JDK rarely uses message and the class & method is in the stack trace.
The ReplicaState, although a varint64, is actually just a byte on the wire. Using a varint64 just allows for future use of the bits.
I’ll note that I must backpedal a bit on some of my claims. The ByteString.Output creates new overflow buffers with larger capacity. That means substring returns bounded instances, not that big a deal, unless we assume the internal implementation of PBs. Even then, it returns bounded instances when it shouldn’t. Multiple buffers are required before it might return a non-bounded buffer. Therefor I optimized the single small buffer case to not substring.
I think the whole fragmented over-the-write buffer implementation is over engineered. The average encoded block is ~15 bytes. The majority of cluster aren’t likely to exceed one buffer (I reduced from 1MB to 64K) per storage, and if they do, multiple little allocations is more work for GC. Removing the intermediate long[] conversions is the real problem. But I digress.
I added comprehensive tests for client auto-detection of NN support; encoding & decoding as buffers and longs.
I mildly disagree will overly verbose message for UnsupportedOperationExceptions since the JDK rarely uses message and the class & method is in the stack trace.
Yeah, I figured there was a reason for not having a message in the UOE, so it won't cause me any heartburn if you don't put them in.
I think the whole fragmented over-the-write buffer implementation is over engineered.
Oh, I kind of liked it.
I'll try to take a look at the new patch soon. Thanks for the reply.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12701196/HDFS-7435.patch
against trunk revision bfbf076.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 10 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
-1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
org.apache.hadoop.hdfs.server.namenode.TestFSImage
Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9673//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9673//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9673//console
This message is automatically generated.
Fixed findbugs by adding equals and reverting back the switch. Fixed fsimage test that used a checkedin image by starting minicluster with upgrade. Fixed oev test by compensating for converted output having latest version so we don't have to keep updating the checkedin binary.
sureshms, jingzhao, could you please review this morning so we may hopefully incorporate this into our builds?
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12701376/HDFS-7435.patch
against trunk revision a979f3b.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 12 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
-1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.
Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9683//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9683//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9683//console
This message is automatically generated.
Would have been nice if findbugs told me I needed equals and hashCode at the same time. :/
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12701436/HDFS-7435.patch
against trunk revision 01a1621.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 12 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs.
Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9686//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9686//console
This message is automatically generated.
Hi @daryn,
The new patch looks pretty good to me. Just a few nits.
FsDatasetImpl still has one line that exceeds the 80 chars, and there are a couple of unused imports in the new test TestBlockListAsLongs. Also in that test, IWBNI you could use the specific Mockito static imports that are needed rather than the * import.
Thanks for working on this, Daryn. The patch looks good to me in general. Some comments below:
- Please add javadoc for BlockListAsLongs.
- I think we'd better not include the replica state for finalized replicas. Though it is encoded using a varint, considering most of the blocks are finalized in the report, this can still increase the total size of the report. How about not including the finalized state and write an extra number in the header to indicate the total number of finalized replicas?
- For BlockReportLongIterator, it can be more clean to only keep either iter or currentBlockIndex.
- Now the iterator becomes an internal field of a BlockListAsLongs. I guess the main motivation here is to support both new and old encoding but avoid having both a ByteString and a long[] inside of BlockListsAsLongs? But in the meanwhile, this means there can be only one user of iterator() in the whole life time of BlockListAsLongs? This usage pattern is true in the current code, but is it good for BlockListAsLongs as a pure data structure?
- Also in getBlockReportIterator, if iter is null, we're actually creating a new BlockListAsLongs instance and returning its iterator. Though this may not be hit by normal code path, this may not be very clean.
- To address the above two comments, maybe what we can do is:
1) declare both the ByteString and the long[] in BlockListAsLongs and build iterator on top
2) or keep the old BlockListAsLongs class (but annotate it as deprecated) and use it for handling the old style encoding. In this way we can have a more clean new BlockListAsLongs, and we can delete the old one in the future.
- Added javadoc.
- There's compelling reasons I must include the state. See below.
- iter.hasNext is insufficient to replace currentBlockIndex. ie. the iter running dry doesn't mean all the expected blocks were read
- Yes, the iterator is/was one-shot for decoding. I added multi-iterator support although I think it's over-engineering for a currently unneeded use case.
- Gone by virtue of #5
- Better yet: use an abstract factory pattern
Kihwal expressed offline concern at bumping the layout version. I added a capabilities long to the version request. It's currently expected to act as a bitmask but we can change that later. If we find that compression of block reports, per Colin, is viable then we can add another bit here so the DN can conditionally compress.
Regarding always including the 1-byte state. The format, among other reasons, was intended as a precusor to improved block report creation on the DN. Currently, the DN does 3 passes over the blocks: one to filter finalized vs uc; second for BlockListAsLongs to encode into List; third to transfer into the PB. I initially reduced it to 2 passes by removing the filtering, but intended to reduce it to 1 in the future, which I did last night for this patch. 1-pass precludes the ability to build a buffer of blocks with varying fields.
The extra bits in the 1-byte can be used for other purposes. Kihwal wants to pass back preferred block "stickyness" via this field. I thought about robbing the upper bits of the block size. However that increases the size of that varint by many bytes.
As performance conscience as we are with our clusters, which is why I'm doing this, a few tens of KB per storage report isn't even a concern. The size of the reports is not a problem, so much as the many hundreds of ms processing time and garbage generation. This patch will greatly reduce the garbage on both the NN and DN, expense of encoding/decoding, and add flexibility for the cost of 1 byte. If compression is viable, then it becomes a moot point.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12702905/HDFS-7435.patch
against trunk revision 95bfd08.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 12 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.server.datanode.TestReadOnlySharedStorage
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer
org.apache.hadoop.hdfs.TestSetrepIncreasing
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
org.apache.hadoop.hdfs.TestDecommission
org.apache.hadoop.hdfs.server.balancer.TestBalancer
org.apache.hadoop.hdfs.server.datanode.TestSimulatedFSDataset
The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage
Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9776//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9776//console
This message is automatically generated.
Test failed because I stubbed the simulated dataset to not return reports... Fixed.
jingzhao, please review. We'd like to add this to our internal builds to help alleviate BR processing issues. We also want to leverage this change to speed up rolling upgrades by dumping/reading the encoded BR to disk which this make trivial to do.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12703930/HDFS-7435.patch
against trunk revision 30c428a.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 12 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9838//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9838//console
This message is automatically generated.
Tests are passing for me. The grumbling about edit log corruption should have nothing to do with this patch.
Thanks for updating the patch, Daryn. The latest patch looks good to me, and I think the new capabilities field in NamespaceInfo is a good idea. Some minors:
- Need to fix javadoc for decodeBuffer, encode, and getBlocksBuffer in BlockListAsLongs.
- Here "older" I guess you mean the current version compared with a future version. Maybe we should make it more clear in the comment.
// reserve upper bits for future use. decoding masks off these bits to // allow compatibility for older namenodes private static long NUM_BYTES_MASK = (-1L) >>> (64 - 48); private static long REPLICA_STATE_MASK = (-1L) >>> (64 - 4);
- Looks like the following code can be simplified and we actually do not need "isSupported"?
+ Capability(boolean isSupported) { + int bits = ordinal() - 1; + mask = (bits < 0) ? 0 : (1L << bits); + if (isSupported) { + CAPABILITIES_SUPPORTED |= mask; + } + }
Yes, "older" is with relation to the future NN. I tried to clarify it in this patch. Updated the javadocs. The isSupported is intended for being able to flip off unsupported/deprecated capabilities so they aren't advertised to the DN.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12704033/HDFS-7435.patch
against trunk revision 7a346bc.
-1 patch. Trunk compilation may be broken.
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9843//console
This message is automatically generated.
nit-pick:
- s/blocksBuffer/blocksBuf/ in the javadoc
/** * Prepare an instance to in-place decode the given ByteString buffer * @param numBlocks - blocks in the buffer * @param blocksBuffer - ByteString encoded varints * @return BlockListAsLongs */ public static BlockListAsLongs decodeBuffer(final int numBlocks, final ByteString blocksBuf) { return new BufferDecoder(numBlocks, blocksBuf); }
+1 after fixing it.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12704194/HDFS-7435.patch
against trunk revision 06ce1d9.
-1 patch. Trunk compilation may be broken.
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9853//console
This message is automatically generated.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12704194/HDFS-7435.patch
against trunk revision 387f271.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 12 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs:
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9874//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9874//console
This message is automatically generated.
The retry cache test failure is HDFS-7524. I cannot reproduce the DN manager timeout, and it's impossible to tell from pre-commit why it failed (no logs). The test normally runs in ~10s, but takes 3min during a prior successful run so maybe the build machine is overwhelmed and on the edge of timing out.
+1 I also ran TestDatanodeManager multiple times on my machine with this patch without any failure.
FAILURE: Integrated in Hadoop-trunk-Commit #7318 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7318/)
HDFS-7435. PB encoding of block reports is very inefficient. Contributed by Daryn Sharp. (kihwal: rev d324164a51a43d72c02567248bd9f0f12b244a40)
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeRegistration.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamespaceInfo.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageBlockReport.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
- hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/TestOfflineEditsViewer.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #132 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/132/)
HDFS-7435. PB encoding of block reports is very inefficient. Contributed by Daryn Sharp. (kihwal: rev d324164a51a43d72c02567248bd9f0f12b244a40)
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageBlockReport.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamespaceInfo.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/TestOfflineEditsViewer.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
- hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeRegistration.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
SUCCESS: Integrated in Hadoop-Yarn-trunk #866 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/866/)
HDFS-7435. PB encoding of block reports is very inefficient. Contributed by Daryn Sharp. (kihwal: rev d324164a51a43d72c02567248bd9f0f12b244a40)
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageBlockReport.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeRegistration.java
- hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamespaceInfo.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/TestOfflineEditsViewer.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
FAILURE: Integrated in Hadoop-Hdfs-trunk #2064 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2064/)
HDFS-7435. PB encoding of block reports is very inefficient. Contributed by Daryn Sharp. (kihwal: rev d324164a51a43d72c02567248bd9f0f12b244a40)
- hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeRegistration.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamespaceInfo.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageBlockReport.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/TestOfflineEditsViewer.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #123 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/123/)
HDFS-7435. PB encoding of block reports is very inefficient. Contributed by Daryn Sharp. (kihwal: rev d324164a51a43d72c02567248bd9f0f12b244a40)
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamespaceInfo.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
- hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/TestOfflineEditsViewer.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageBlockReport.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeRegistration.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #132 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/132/)
HDFS-7435. PB encoding of block reports is very inefficient. Contributed by Daryn Sharp. (kihwal: rev d324164a51a43d72c02567248bd9f0f12b244a40)
- hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageBlockReport.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/TestOfflineEditsViewer.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeRegistration.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamespaceInfo.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2082 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2082/)
HDFS-7435. PB encoding of block reports is very inefficient. Contributed by Daryn Sharp. (kihwal: rev d324164a51a43d72c02567248bd9f0f12b244a40)
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/NamespaceInfo.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/TestBlockListAsLongs.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/BlockReportTestBase.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/BlockListAsLongs.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockHasMultipleReplicasOnSameDN.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeRegistration.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDnRespectsBlockReportSplitThreshold.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
- hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/TestOfflineEditsViewer.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageBlockReport.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java
- hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
In NNThroughputBenchmark class, the change
void formBlockReport() { // fill remaining slots with blocks that do not exist - for(int idx = blocks.size()-1; idx >= nrBlocks; idx--) - blocks.set(idx, new Block(blocks.size() - idx, 0, 0)); - blockReportList = new BlockListAsLongs(blocks).getBlockListAsLongs(); + for (int idx = blocks.size()-1; idx >= nrBlocks; idx--) { + Block block = new Block(blocks.size() - idx, 0, 0); + blocks.set(idx, new BlockReportReplica(block)); + } + blockReportList = BlockListAsLongs.EMPTY;
Why the last line is not building the blockReportList using blocks field as following?
blockReportList = BlockListAsLongs.encode(blocks);
I may need more context to understand this patch, while my goal is to work on NNThroughputBenchmark. Thanks.
I agree this broke {NNThroughputBenchmark.BlockReportStats}} operation as it makes DNs always send an empty block report no matter what the parameters of the benchmark are.
Good catch Mingliang. Let's file a jira to fix this. daryn are you available to fix this?
liuml07, good catch! Not sure how I accidentally did that and I don't know why the test still passes. I think your proposed fix is correct. I know extremely little of the benchmark tool so I'd be unable to add a test (for the test!) in a timely manner, so if you'd like the credit for the bug then I'm fine with you doing the jira.
Thanks for your comment, daryn and shv. I started to work on NNThroughputBenchmark just recently and knew nothing about it before that. If the empty block report list in this patch was not intensional, I think I don't need more context of this patch. Sure I'd like to make the change as we discussed above. The jira is HDFS-9484. Let's continue further discussion on the fix there.
The reason why the unit tests could pass may be that the TestNNThroughputBenchmark is rather a driver to run the benchmark with default parameters than a real unit test that asserts expected behavior for different scenarios. If we need a sophisticated unit test, perhaps we can address it separately.
Adds a backwards-compatible that is more efficient for the primitive long array. Both array re-allocs and the unnecessary boxing/unboxing of the longs is eliminated.
Datanode detects namenode version to determine if it should use the current repeating field or a new more efficient buffer. CodedOutputStream is used to explicitly encode the longs into a buffer with # of longs followed by varint64 encoded longs.
Namenode detects if datanode has sent old or new format. If new format, it can immediately allocate a long[] of the correct size and decode straight into it which eliminates both re-allocs & unboxing overhead.