Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Attachments
Attachments
- HDFS-6561.2.patch
- 29 kB
- James Thomas
- HDFS-6561.3.patch
- 29 kB
- James Thomas
- HDFS-6561.4.patch
- 25 kB
- James Thomas
- HDFS-6561.5.patch
- 25 kB
- James Thomas
- HDFS-6561.patch
- 28 kB
- James Thomas
- hdfs-6561-just-hadoop-changes.txt
- 25 kB
- Todd Lipcon
Issue Links
- breaks
-
HADOOP-11064 UnsatisifedLinkError with hadoop 2.4 JARs on hadoop-2.6 due to NativeCRC32 method changes
- Closed
- depends upon
-
HADOOP-10838 Byte array native checksumming
- Closed
- is depended upon by
-
HDFS-6865 Byte array native checksumming on client side (HDFS changes)
- Closed
Activity
+1 sounds good to me, pretty sure that flushes are normally bigger than 100B. Not really a usecase we're optimized for anyway.
Patch attached.
Was finally able to get some perf stat output comparing a 1 GB put operation with this change + HDFS-6560 and without. I ran the put against a MiniDFSCluster so that everything was in a single process and we could easily measure cycle savings across the cluster. We see about a 10% reduction in CPU cycles and a 4% reduction in wall clock time. These measurements were taken on my personal computer (SSD, quad core i7).
***WITHOUT CHANGES***
james@james-ThinkPad-T530:~/Desktop/hadoop-common$ perf stat -r 20 java org.junit.runner.JUnitCore org.apache.hadoop.hdfs.TestRWCycles
Performance counter stats for 'java org.junit.runner.JUnitCore org.apache.hadoop.hdfs.TestWriteCycles' (20 runs):
14848.579075 task-clock (msec) # 1.637 CPUs utilized ( +- 0.37% )
75,748 context-switches # 0.005 M/sec ( +- 0.29% )
3,855 cpu-migrations # 0.260 K/sec ( +- 1.66% )
75,120 page-faults # 0.005 M/sec ( +- 2.15% )
50,376,887,490 cycles # 3.393 GHz ( +- 0.40% )
31,427,207,136 stalled-cycles-frontend # 62.38% frontend cycles idle ( +- 0.62% )
<not supported> stalled-cycles-backend
46,918,968,840 instructions # 0.93 insns per cycle
- 0.67 stalled cycles per insn ( +- 0.26% )
6,523,666,030 branches # 439.346 M/sec ( +- 0.39% )
304,740,080 branch-misses # 4.67% of all branches ( +- 0.19% )
9.068159220 seconds time elapsed ( +- 0.33% )
***WITH HDFS-6560 and HDFS-6561 (client-side checksum calculation and DN checksum verification both nativized)***
james@james-ThinkPad-T530:~/Desktop/hadoop-common$ perf stat -r 20 java org.junit.runner.JUnitCore org.apache.hadoop.hdfs.TestWriteCycles
Performance counter stats for 'java org.junit.runner.JUnitCore org.apache.hadoop.hdfs.TestRWCycles' (20 runs):
13703.187333 task-clock (msec) # 1.575 CPUs utilized ( +- 0.74% )
79,256 context-switches # 0.006 M/sec ( +- 0.27% )
3,161 cpu-migrations # 0.231 K/sec ( +- 2.45% )
72,138 page-faults # 0.005 M/sec ( +- 2.35% )
46,553,899,994 cycles # 3.397 GHz ( +- 0.83% )
30,670,949,579 stalled-cycles-frontend # 65.88% frontend cycles idle ( +- 1.08% )
<not supported> stalled-cycles-backend
35,616,207,543 instructions # 0.77 insns per cycle
- 0.86 stalled cycles per insn ( +- 0.73% )
6,614,919,487 branches # 482.729 M/sec ( +- 0.73% )
313,683,166 branch-misses # 4.74% of all branches ( +- 0.56% )
8.699152006 seconds time elapsed ( +- 0.44% )
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12655130/HDFS-6561.patch
against trunk revision .
-1 patch. The patch command could not apply the patch.
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7328//console
This message is automatically generated.
Need to consolidate this patch with the one in HDFS-6560 for it to apply ... will wait for review first.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12656413/HDFS-6561.2.patch
against trunk revision .
-1 patch. The patch command could not apply the patch.
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7377//console
This message is automatically generated.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12656547/HDFS-6561.3.patch
against trunk revision .
-1 patch. The patch command could not apply the patch.
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7387//console
This message is automatically generated.
Hey James. I looked over this patch and wrote/ran some performance tests.
At first, I was a little concerned that the way you consolidated the code for bulk_crc32 would slow things down. In particular, there's now a new branch per chunk to determine whether to store or verify the CRC. I was worried that, if this branch were mispredicted, we'd pay an extra 15-20 cycles for every 512-byte chunk (which at 0.13 cycles/byte only takes ~66 cycles). That would represent a close to 20% performance regression.
So, I wrote a simple test which approximates exactly the HDFS usage of these APIs – ie 512 byte chunks and a reasonable amount of data. In this test, I found that the above concern was unwarranted - probably because the branch prediction unit does a very good job with the simple branch pattern here. I'll attach a version of your patch which includes the benchmark that I wrote in case anyone else wants to run it.
Here are my average timings for 512MB of 512-byte-chunked checksums (on my Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz)
Before:
Calculate: 401275us (1.28GB/sec)
Verify: 41184us (12.43GB/sec
After:
Calculate: 41808us (12.25GB/sec)
Verify: 41604us (12.31GB/sec)
These seem to match earlier results you've posted elsewhere - just wanted to confirm on my machine and make sure that the existing verify code path didn't regress due to the new functionality.
For ease of review, I also think it makes sense to split this patch up a little further, and make this JIRA only do the changes to the checksumming code to allow for native calculation. The changes to FSOutputSummer, DFSOutputStream, etc, are a bit more complex and probably should be reviewed separately. I took the liberty of removing those chunks from the patch as I was testing it, so I'll upload that here and you can take a look.
Given the above, I only reviewed the portion related to checksumming and didn't yet look in detail at the outputsummer, etc, changes.
Here's James's patch with the following changes:
- only the hadoop side changes, not the HDFS changes yet (which we can break out elsewhere since they need more careful review)
- adds a perf test for our common use case
- fixed one trivial whitespace error in james's patch (bad indentation)
I'm +1 on this, but since I made a couple changes/additions to the patch, we should probably have another committer take a quick look as well.
Great idea.
The hdfs-6561-just-hadoop-changes.txt patch needs to be rebased... it didn't apply cleanly for me against trunk.
JNIEXPORT void JNICALL Java_org_apache_hadoop_util_NativeCrc32_nativeComputeChunkedSums (JNIEnv *env, jclass clazz, jint bytes_per_checksum, jint j_crc_type, jobject j_sums, jint sums_offset, jobject j_data, jint data_offset, jint data_len, jstring j_filename, jlong base_pos, jboolean verify)
Later, you use an if(likely) on the verify boolean. Rather than do this, why not just have a utility function that both nativeComputeChunkedSumsByteArray and nativeVerifyChunkedSums call?
-#include <stdint.h> +#include <stdbool.h>
Please, no. There are a lot of older C compilers floating around out there that will choke on this. Plus we still need stdint.h, since we're using uint32_t, etc. etc. I don't think the C99 _Bool stuff adds a lot of type safety anyway, since any non-struct type can implicitly be converted to a bool, and a bool can be used as in int in many contexts.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12661574/hdfs-6561-just-hadoop-changes.txt
against trunk revision .
-1 patch. The patch command could not apply the patch.
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7631//console
This message is automatically generated.
This patch doesn't apply because it's dependent on HADOOP-10838. James, want to revise the patch based on Colin's comments above?
Todd's patch with the second change suggested by Colin. cmccabe, as we discussed, I think it will be difficult to split up nativeComputeChunkedSums* into separate functions for calculate and verify and still share code between them. And as Todd noted above, the branch does not cause much performance overhead.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12661927/HDFS-6561.4.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 2 new or modified test files.
-1 javac. The patch appears to cause the build to fail.
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7638//console
This message is automatically generated.
So, I think that we could refactor this to be nicer. I think we really don't need the boolean everywhere, but we could refactor things into utility functions and two separate code paths. This would also gain back that tiny bit of performance.
But, that might involve a lot more work, so I'm ok with combining the code paths for now. In view of Todd's results, there's shouldn't be any significant performance regression. We can always split them later.
- if (unlikely(crc != *sums)) { + if (unlikely(!store_or_verify(sums, crc, is_verify))) goto return_crc_error; - }
Let's not get rid of the braces here.
+1 once that's addressed. Thanks, James.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12661935/HDFS-6561.5.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 2 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. There were no new javadoc warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7640//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7640//console
This message is automatically generated.
Hmm. I just committed this under the old HDFS name. I also tried to change it to HADOOP, but found it couldn't be done without making it no longer a subtask.
yea, Colin and I had a race condition. I was renaming the JIRA while he was committing it under the old name. I updated CHANGES.txt to refer to the new name.
Anyway, thanks for the contribution, James
FAILURE: Integrated in Hadoop-trunk-Commit #6086 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6086/)
Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
FAILURE: Integrated in Hadoop-trunk-Commit #6086 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6086/)
Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
FAILURE: Integrated in Hadoop-Yarn-trunk #651 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/651/)
Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
FAILURE: Integrated in Hadoop-Yarn-trunk #651 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/651/)
Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
FAILURE: Integrated in Hadoop-Hdfs-trunk #1842 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1842/)
Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
FAILURE: Integrated in Hadoop-Hdfs-trunk #1842 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1842/)
Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
FAILURE: Integrated in Hadoop-Mapreduce-trunk #1868 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1868/)
Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
FAILURE: Integrated in Hadoop-Mapreduce-trunk #1868 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1868/)
Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
It seems like the best approach here would be to add some buffering to FSOutputSummer. I did some testing and it seems like going to native code is strictly faster than using the incremental summer as long as flushes happen no less often than every 100 bytes. More specifically, if we use the native code, which has no option for incremental checksumming (it could be added in, but I don't think it's necessary), if we compute a checksum of a partial chunk, we will need to recompute the entire checksum (rather than just the checksum of the newly added bytes) when the chunk either 1) fills out or 2) is incremented a bit and then a flush occurs. But the native checksum on 512 bytes (the chunk size and the maximum possible size of a partial chunk) takes at most 7 microseconds on my machine, and the incremental Java checksum requires around 7 microseconds for 100 bytes. So as long as it's reasonable to assume flushes won't happen more often than every 100 bytes, I propose that we eliminate the incremental summer from FSOutputSummer and always use the native code for byte arrays.
Buffering FSOutputSummer will mean that writes on the wire are more bursty and data is kept in client memory longer, but Todd mentioned that many clients already wrap DFSOutputStream in a BufferedWriter anyway. To achieve maximum performance in the native code, we need the buffer to be at least a few thousand bytes. The larger the buffer, the better, since we can amortize away the cost of crossing the JNI boundary (which seems to be quite small actually, on the order of a microsecond at most).
What do people think about this?