Hadoop Common
  1. Hadoop Common
  2. HADOOP-10975

org.apache.hadoop.util.DataChecksum should support native checksum calculation

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.6.0
    • Component/s: performance
    • Labels:
      None
    1. HDFS-6561.patch
      28 kB
      James Thomas
    2. HDFS-6561.2.patch
      29 kB
      James Thomas
    3. HDFS-6561.3.patch
      29 kB
      James Thomas
    4. hdfs-6561-just-hadoop-changes.txt
      25 kB
      Todd Lipcon
    5. HDFS-6561.4.patch
      25 kB
      James Thomas
    6. HDFS-6561.5.patch
      25 kB
      James Thomas

      Issue Links

        Activity

        James Thomas created issue -
        James Thomas made changes -
        Field Original Value New Value
        Component/s datanode [ 12312927 ]
        Component/s hdfs-client [ 12312928 ]
        Component/s performance [ 12316501 ]
        James Thomas made changes -
        Assignee James Thomas [ james.thomas ]
        Hide
        James Thomas added a comment -

        It seems like the best approach here would be to add some buffering to FSOutputSummer. I did some testing and it seems like going to native code is strictly faster than using the incremental summer as long as flushes happen no less often than every 100 bytes. More specifically, if we use the native code, which has no option for incremental checksumming (it could be added in, but I don't think it's necessary), if we compute a checksum of a partial chunk, we will need to recompute the entire checksum (rather than just the checksum of the newly added bytes) when the chunk either 1) fills out or 2) is incremented a bit and then a flush occurs. But the native checksum on 512 bytes (the chunk size and the maximum possible size of a partial chunk) takes at most 7 microseconds on my machine, and the incremental Java checksum requires around 7 microseconds for 100 bytes. So as long as it's reasonable to assume flushes won't happen more often than every 100 bytes, I propose that we eliminate the incremental summer from FSOutputSummer and always use the native code for byte arrays.

        Buffering FSOutputSummer will mean that writes on the wire are more bursty and data is kept in client memory longer, but Todd mentioned that many clients already wrap DFSOutputStream in a BufferedWriter anyway. To achieve maximum performance in the native code, we need the buffer to be at least a few thousand bytes. The larger the buffer, the better, since we can amortize away the cost of crossing the JNI boundary (which seems to be quite small actually, on the order of a microsecond at most).

        What do people think about this?

        Show
        James Thomas added a comment - It seems like the best approach here would be to add some buffering to FSOutputSummer. I did some testing and it seems like going to native code is strictly faster than using the incremental summer as long as flushes happen no less often than every 100 bytes. More specifically, if we use the native code, which has no option for incremental checksumming (it could be added in, but I don't think it's necessary), if we compute a checksum of a partial chunk, we will need to recompute the entire checksum (rather than just the checksum of the newly added bytes) when the chunk either 1) fills out or 2) is incremented a bit and then a flush occurs. But the native checksum on 512 bytes (the chunk size and the maximum possible size of a partial chunk) takes at most 7 microseconds on my machine, and the incremental Java checksum requires around 7 microseconds for 100 bytes. So as long as it's reasonable to assume flushes won't happen more often than every 100 bytes, I propose that we eliminate the incremental summer from FSOutputSummer and always use the native code for byte arrays. Buffering FSOutputSummer will mean that writes on the wire are more bursty and data is kept in client memory longer, but Todd mentioned that many clients already wrap DFSOutputStream in a BufferedWriter anyway. To achieve maximum performance in the native code, we need the buffer to be at least a few thousand bytes. The larger the buffer, the better, since we can amortize away the cost of crossing the JNI boundary (which seems to be quite small actually, on the order of a microsecond at most). What do people think about this?
        Hide
        Andrew Wang added a comment -

        +1 sounds good to me, pretty sure that flushes are normally bigger than 100B. Not really a usecase we're optimized for anyway.

        Show
        Andrew Wang added a comment - +1 sounds good to me, pretty sure that flushes are normally bigger than 100B. Not really a usecase we're optimized for anyway.
        James Thomas made changes -
        Link This issue depends upon HDFS-6560 [ HDFS-6560 ]
        Hide
        James Thomas added a comment -

        Patch attached.

        Was finally able to get some perf stat output comparing a 1 GB put operation with this change + HDFS-6560 and without. I ran the put against a MiniDFSCluster so that everything was in a single process and we could easily measure cycle savings across the cluster. We see about a 10% reduction in CPU cycles and a 4% reduction in wall clock time. These measurements were taken on my personal computer (SSD, quad core i7).

        ***WITHOUT CHANGES***
        james@james-ThinkPad-T530:~/Desktop/hadoop-common$ perf stat -r 20 java org.junit.runner.JUnitCore org.apache.hadoop.hdfs.TestRWCycles

        Performance counter stats for 'java org.junit.runner.JUnitCore org.apache.hadoop.hdfs.TestWriteCycles' (20 runs):

        14848.579075 task-clock (msec) # 1.637 CPUs utilized ( +- 0.37% )
        75,748 context-switches # 0.005 M/sec ( +- 0.29% )
        3,855 cpu-migrations # 0.260 K/sec ( +- 1.66% )
        75,120 page-faults # 0.005 M/sec ( +- 2.15% )
        50,376,887,490 cycles # 3.393 GHz ( +- 0.40% )
        31,427,207,136 stalled-cycles-frontend # 62.38% frontend cycles idle ( +- 0.62% )
        <not supported> stalled-cycles-backend
        46,918,968,840 instructions # 0.93 insns per cycle

        1. 0.67 stalled cycles per insn ( +- 0.26% )
          6,523,666,030 branches # 439.346 M/sec ( +- 0.39% )
          304,740,080 branch-misses # 4.67% of all branches ( +- 0.19% )

        9.068159220 seconds time elapsed ( +- 0.33% )

        ***WITH HDFS-6560 and HDFS-6561 (client-side checksum calculation and DN checksum verification both nativized)***
        james@james-ThinkPad-T530:~/Desktop/hadoop-common$ perf stat -r 20 java org.junit.runner.JUnitCore org.apache.hadoop.hdfs.TestWriteCycles

        Performance counter stats for 'java org.junit.runner.JUnitCore org.apache.hadoop.hdfs.TestRWCycles' (20 runs):

        13703.187333 task-clock (msec) # 1.575 CPUs utilized ( +- 0.74% )
        79,256 context-switches # 0.006 M/sec ( +- 0.27% )
        3,161 cpu-migrations # 0.231 K/sec ( +- 2.45% )
        72,138 page-faults # 0.005 M/sec ( +- 2.35% )
        46,553,899,994 cycles # 3.397 GHz ( +- 0.83% )
        30,670,949,579 stalled-cycles-frontend # 65.88% frontend cycles idle ( +- 1.08% )
        <not supported> stalled-cycles-backend
        35,616,207,543 instructions # 0.77 insns per cycle

        1. 0.86 stalled cycles per insn ( +- 0.73% )
          6,614,919,487 branches # 482.729 M/sec ( +- 0.73% )
          313,683,166 branch-misses # 4.74% of all branches ( +- 0.56% )

        8.699152006 seconds time elapsed ( +- 0.44% )

        Show
        James Thomas added a comment - Patch attached. Was finally able to get some perf stat output comparing a 1 GB put operation with this change + HDFS-6560 and without. I ran the put against a MiniDFSCluster so that everything was in a single process and we could easily measure cycle savings across the cluster. We see about a 10% reduction in CPU cycles and a 4% reduction in wall clock time. These measurements were taken on my personal computer (SSD, quad core i7). *** WITHOUT CHANGES *** james@james-ThinkPad-T530:~/Desktop/hadoop-common$ perf stat -r 20 java org.junit.runner.JUnitCore org.apache.hadoop.hdfs.TestRWCycles Performance counter stats for 'java org.junit.runner.JUnitCore org.apache.hadoop.hdfs.TestWriteCycles' (20 runs): 14848.579075 task-clock (msec) # 1.637 CPUs utilized ( +- 0.37% ) 75,748 context-switches # 0.005 M/sec ( +- 0.29% ) 3,855 cpu-migrations # 0.260 K/sec ( +- 1.66% ) 75,120 page-faults # 0.005 M/sec ( +- 2.15% ) 50,376,887,490 cycles # 3.393 GHz ( +- 0.40% ) 31,427,207,136 stalled-cycles-frontend # 62.38% frontend cycles idle ( +- 0.62% ) <not supported> stalled-cycles-backend 46,918,968,840 instructions # 0.93 insns per cycle 0.67 stalled cycles per insn ( +- 0.26% ) 6,523,666,030 branches # 439.346 M/sec ( +- 0.39% ) 304,740,080 branch-misses # 4.67% of all branches ( +- 0.19% ) 9.068159220 seconds time elapsed ( +- 0.33% ) *** WITH HDFS-6560 and HDFS-6561 (client-side checksum calculation and DN checksum verification both nativized) *** james@james-ThinkPad-T530:~/Desktop/hadoop-common$ perf stat -r 20 java org.junit.runner.JUnitCore org.apache.hadoop.hdfs.TestWriteCycles Performance counter stats for 'java org.junit.runner.JUnitCore org.apache.hadoop.hdfs.TestRWCycles' (20 runs): 13703.187333 task-clock (msec) # 1.575 CPUs utilized ( +- 0.74% ) 79,256 context-switches # 0.006 M/sec ( +- 0.27% ) 3,161 cpu-migrations # 0.231 K/sec ( +- 2.45% ) 72,138 page-faults # 0.005 M/sec ( +- 2.35% ) 46,553,899,994 cycles # 3.397 GHz ( +- 0.83% ) 30,670,949,579 stalled-cycles-frontend # 65.88% frontend cycles idle ( +- 1.08% ) <not supported> stalled-cycles-backend 35,616,207,543 instructions # 0.77 insns per cycle 0.86 stalled cycles per insn ( +- 0.73% ) 6,614,919,487 branches # 482.729 M/sec ( +- 0.73% ) 313,683,166 branch-misses # 4.74% of all branches ( +- 0.56% ) 8.699152006 seconds time elapsed ( +- 0.44% )
        James Thomas made changes -
        Attachment HDFS-6561.patch [ 12655130 ]
        James Thomas made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12655130/HDFS-6561.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7328//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655130/HDFS-6561.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7328//console This message is automatically generated.
        Hide
        James Thomas added a comment -

        Need to consolidate this patch with the one in HDFS-6560 for it to apply ... will wait for review first.

        Show
        James Thomas added a comment - Need to consolidate this patch with the one in HDFS-6560 for it to apply ... will wait for review first.
        Hide
        James Thomas added a comment -

        Updated patch based on changes to HADOOP-10838.

        Show
        James Thomas added a comment - Updated patch based on changes to HADOOP-10838 .
        James Thomas made changes -
        Attachment HDFS-6561.2.patch [ 12656413 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12656413/HDFS-6561.2.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7377//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656413/HDFS-6561.2.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7377//console This message is automatically generated.
        Hide
        James Thomas added a comment -

        Updated due to changes in HADOOP-10838.

        Show
        James Thomas added a comment - Updated due to changes in HADOOP-10838 .
        James Thomas made changes -
        Attachment HDFS-6561.3.patch [ 12656547 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12656547/HDFS-6561.3.patch
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7387//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656547/HDFS-6561.3.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7387//console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        Hey James. I looked over this patch and wrote/ran some performance tests.

        At first, I was a little concerned that the way you consolidated the code for bulk_crc32 would slow things down. In particular, there's now a new branch per chunk to determine whether to store or verify the CRC. I was worried that, if this branch were mispredicted, we'd pay an extra 15-20 cycles for every 512-byte chunk (which at 0.13 cycles/byte only takes ~66 cycles). That would represent a close to 20% performance regression.

        So, I wrote a simple test which approximates exactly the HDFS usage of these APIs – ie 512 byte chunks and a reasonable amount of data. In this test, I found that the above concern was unwarranted - probably because the branch prediction unit does a very good job with the simple branch pattern here. I'll attach a version of your patch which includes the benchmark that I wrote in case anyone else wants to run it.

        Here are my average timings for 512MB of 512-byte-chunked checksums (on my Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz)

        Before:
        Calculate: 401275us (1.28GB/sec)
        Verify: 41184us (12.43GB/sec

        After:
        Calculate: 41808us (12.25GB/sec)
        Verify: 41604us (12.31GB/sec)

        These seem to match earlier results you've posted elsewhere - just wanted to confirm on my machine and make sure that the existing verify code path didn't regress due to the new functionality.

        For ease of review, I also think it makes sense to split this patch up a little further, and make this JIRA only do the changes to the checksumming code to allow for native calculation. The changes to FSOutputSummer, DFSOutputStream, etc, are a bit more complex and probably should be reviewed separately. I took the liberty of removing those chunks from the patch as I was testing it, so I'll upload that here and you can take a look.

        Given the above, I only reviewed the portion related to checksumming and didn't yet look in detail at the outputsummer, etc, changes.

        Show
        Todd Lipcon added a comment - Hey James. I looked over this patch and wrote/ran some performance tests. At first, I was a little concerned that the way you consolidated the code for bulk_crc32 would slow things down. In particular, there's now a new branch per chunk to determine whether to store or verify the CRC. I was worried that, if this branch were mispredicted, we'd pay an extra 15-20 cycles for every 512-byte chunk (which at 0.13 cycles/byte only takes ~66 cycles). That would represent a close to 20% performance regression. So, I wrote a simple test which approximates exactly the HDFS usage of these APIs – ie 512 byte chunks and a reasonable amount of data. In this test, I found that the above concern was unwarranted - probably because the branch prediction unit does a very good job with the simple branch pattern here. I'll attach a version of your patch which includes the benchmark that I wrote in case anyone else wants to run it. Here are my average timings for 512MB of 512-byte-chunked checksums (on my Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz) Before : Calculate: 401275us (1.28GB/sec) Verify: 41184us (12.43GB/sec After : Calculate: 41808us (12.25GB/sec) Verify: 41604us (12.31GB/sec) These seem to match earlier results you've posted elsewhere - just wanted to confirm on my machine and make sure that the existing verify code path didn't regress due to the new functionality. For ease of review, I also think it makes sense to split this patch up a little further, and make this JIRA only do the changes to the checksumming code to allow for native calculation. The changes to FSOutputSummer, DFSOutputStream, etc, are a bit more complex and probably should be reviewed separately. I took the liberty of removing those chunks from the patch as I was testing it, so I'll upload that here and you can take a look. Given the above, I only reviewed the portion related to checksumming and didn't yet look in detail at the outputsummer, etc, changes.
        Hide
        Todd Lipcon added a comment -

        Here's James's patch with the following changes:

        • only the hadoop side changes, not the HDFS changes yet (which we can break out elsewhere since they need more careful review)
        • adds a perf test for our common use case
        • fixed one trivial whitespace error in james's patch (bad indentation)

        I'm +1 on this, but since I made a couple changes/additions to the patch, we should probably have another committer take a quick look as well.

        Show
        Todd Lipcon added a comment - Here's James's patch with the following changes: only the hadoop side changes, not the HDFS changes yet (which we can break out elsewhere since they need more careful review) adds a perf test for our common use case fixed one trivial whitespace error in james's patch (bad indentation) I'm +1 on this, but since I made a couple changes/additions to the patch, we should probably have another committer take a quick look as well.
        Todd Lipcon made changes -
        Attachment hdfs-6561-just-hadoop-changes.txt [ 12661574 ]
        Hide
        Colin Patrick McCabe added a comment -

        Great idea.

        The hdfs-6561-just-hadoop-changes.txt patch needs to be rebased... it didn't apply cleanly for me against trunk.

        JNIEXPORT void JNICALL Java_org_apache_hadoop_util_NativeCrc32_nativeComputeChunkedSums
          (JNIEnv *env, jclass clazz,
            jint bytes_per_checksum, jint j_crc_type,
            jobject j_sums, jint sums_offset,
            jobject j_data, jint data_offset, jint data_len,
            jstring j_filename, jlong base_pos, jboolean verify)
        

        Later, you use an if(likely) on the verify boolean. Rather than do this, why not just have a utility function that both nativeComputeChunkedSumsByteArray and nativeVerifyChunkedSums call?

        -#include <stdint.h>
        +#include <stdbool.h>
        

        Please, no. There are a lot of older C compilers floating around out there that will choke on this. Plus we still need stdint.h, since we're using uint32_t, etc. etc. I don't think the C99 _Bool stuff adds a lot of type safety anyway, since any non-struct type can implicitly be converted to a bool, and a bool can be used as in int in many contexts.

        Show
        Colin Patrick McCabe added a comment - Great idea. The hdfs-6561-just-hadoop-changes.txt patch needs to be rebased... it didn't apply cleanly for me against trunk. JNIEXPORT void JNICALL Java_org_apache_hadoop_util_NativeCrc32_nativeComputeChunkedSums (JNIEnv *env, jclass clazz, jint bytes_per_checksum, jint j_crc_type, jobject j_sums, jint sums_offset, jobject j_data, jint data_offset, jint data_len, jstring j_filename, jlong base_pos, jboolean verify) Later, you use an if(likely) on the verify boolean. Rather than do this, why not just have a utility function that both nativeComputeChunkedSumsByteArray and nativeVerifyChunkedSums call? -#include <stdint.h> +#include <stdbool.h> Please, no. There are a lot of older C compilers floating around out there that will choke on this. Plus we still need stdint.h , since we're using uint32_t , etc. etc. I don't think the C99 _Bool stuff adds a lot of type safety anyway, since any non-struct type can implicitly be converted to a bool, and a bool can be used as in int in many contexts.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12661574/hdfs-6561-just-hadoop-changes.txt
        against trunk revision .

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7631//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661574/hdfs-6561-just-hadoop-changes.txt against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7631//console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        This patch doesn't apply because it's dependent on HADOOP-10838. James, want to revise the patch based on Colin's comments above?

        Show
        Todd Lipcon added a comment - This patch doesn't apply because it's dependent on HADOOP-10838 . James, want to revise the patch based on Colin's comments above?
        Hide
        James Thomas added a comment -

        Todd's patch with the second change suggested by Colin. Colin Patrick McCabe, as we discussed, I think it will be difficult to split up nativeComputeChunkedSums* into separate functions for calculate and verify and still share code between them. And as Todd noted above, the branch does not cause much performance overhead.

        Show
        James Thomas added a comment - Todd's patch with the second change suggested by Colin. Colin Patrick McCabe , as we discussed, I think it will be difficult to split up nativeComputeChunkedSums* into separate functions for calculate and verify and still share code between them. And as Todd noted above, the branch does not cause much performance overhead.
        James Thomas made changes -
        Attachment HDFS-6561.4.patch [ 12661927 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12661927/HDFS-6561.4.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        -1 javac. The patch appears to cause the build to fail.

        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7638//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661927/HDFS-6561.4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. -1 javac . The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7638//console This message is automatically generated.
        Hide
        Colin Patrick McCabe added a comment -

        So, I think that we could refactor this to be nicer. I think we really don't need the boolean everywhere, but we could refactor things into utility functions and two separate code paths. This would also gain back that tiny bit of performance.

        But, that might involve a lot more work, so I'm ok with combining the code paths for now. In view of Todd's results, there's shouldn't be any significant performance regression. We can always split them later.

        -    if (unlikely(crc != *sums)) {
        +    if (unlikely(!store_or_verify(sums, crc, is_verify)))
               goto return_crc_error;
        -    }
        

        Let's not get rid of the braces here.

        +1 once that's addressed. Thanks, James.

        Show
        Colin Patrick McCabe added a comment - So, I think that we could refactor this to be nicer. I think we really don't need the boolean everywhere, but we could refactor things into utility functions and two separate code paths. This would also gain back that tiny bit of performance. But, that might involve a lot more work, so I'm ok with combining the code paths for now. In view of Todd's results, there's shouldn't be any significant performance regression. We can always split them later. - if (unlikely(crc != *sums)) { + if (unlikely(!store_or_verify(sums, crc, is_verify))) goto return_crc_error; - } Let's not get rid of the braces here. +1 once that's addressed. Thanks, James.
        Hide
        James Thomas added a comment -

        Thanks, Colin. Posted updated patch.

        Show
        James Thomas added a comment - Thanks, Colin. Posted updated patch.
        James Thomas made changes -
        Attachment HDFS-6561.5.patch [ 12661935 ]
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12661935/HDFS-6561.5.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7640//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7640//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661935/HDFS-6561.5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 2 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7640//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7640//console This message is automatically generated.
        James Thomas made changes -
        Summary Byte array native checksumming on client side Byte array native checksumming on client side (Hadoop changes)
        Colin Patrick McCabe made changes -
        Summary Byte array native checksumming on client side (Hadoop changes) org.apache.hadoop.util.DataChecksum should support native checksumming
        Hide
        Todd Lipcon added a comment -

        Cool, I'll commit this momentarily. Thanks for the extra +1 Colin.

        Show
        Todd Lipcon added a comment - Cool, I'll commit this momentarily. Thanks for the extra +1 Colin.
        Todd Lipcon made changes -
        Parent HDFS-3528 [ 12560405 ]
        Issue Type Sub-task [ 7 ] Improvement [ 4 ]
        Todd Lipcon made changes -
        Project Hadoop HDFS [ 12310942 ] Hadoop Common [ 12310240 ]
        Key HDFS-6561 HADOOP-10975
        Component/s performance [ 12316502 ]
        Component/s datanode [ 12312927 ]
        Component/s hdfs-client [ 12312928 ]
        Component/s performance [ 12316501 ]
        Hide
        Todd Lipcon added a comment -

        (moved to HADOOP project since this is only the Hadoop side changes)

        Show
        Todd Lipcon added a comment - (moved to HADOOP project since this is only the Hadoop side changes)
        Colin Patrick McCabe made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 2.6.0 [ 12327179 ]
        Resolution Fixed [ 1 ]
        Hide
        Colin Patrick McCabe added a comment -

        Hmm. I just committed this under the old HDFS name. I also tried to change it to HADOOP, but found it couldn't be done without making it no longer a subtask.

        Show
        Colin Patrick McCabe added a comment - Hmm. I just committed this under the old HDFS name. I also tried to change it to HADOOP, but found it couldn't be done without making it no longer a subtask.
        Hide
        Todd Lipcon added a comment -

        yea, Colin and I had a race condition. I was renaming the JIRA while he was committing it under the old name. I updated CHANGES.txt to refer to the new name.

        Anyway, thanks for the contribution, James

        Show
        Todd Lipcon added a comment - yea, Colin and I had a race condition. I was renaming the JIRA while he was committing it under the old name. I updated CHANGES.txt to refer to the new name. Anyway, thanks for the contribution, James
        Todd Lipcon made changes -
        Summary org.apache.hadoop.util.DataChecksum should support native checksumming org.apache.hadoop.util.DataChecksum should support native checksum calculation
        Todd Lipcon made changes -
        Link This issue is depended upon by HDFS-6865 [ HDFS-6865 ]
        Hide
        Colin Patrick McCabe added a comment -

        thanks, Todd and James.

        Show
        Colin Patrick McCabe added a comment - thanks, Todd and James.
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #6086 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6086/)
        Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #6086 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6086/ ) Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975 . (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #6086 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6086/)
        Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #6086 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6086/ ) Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975 . (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #651 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/651/)
        Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #651 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/651/ ) Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975 . (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #651 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/651/)
        Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #651 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/651/ ) Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975 . (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #1842 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1842/)
        Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1842 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1842/ ) Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975 . (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #1842 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1842/)
        Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1842 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1842/ ) Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975 . (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #1868 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1868/)
        Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1868 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1868/ ) Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975 . (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Hide
        Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #1868 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1868/)
        Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975. (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686)

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1868 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1868/ ) Update CHANGES.txt for HDFS-6561 which was renamed to HADOOP-10975 . (todd: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1618686 ) /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        Steve Loughran made changes -
        Link This issue breaks HADOOP-11064 [ HADOOP-11064 ]
        Arun C Murthy made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        22d 4h 35m 1 James Thomas 11/Jul/14 02:39
        Patch Available Patch Available Resolved Resolved
        38d 16h 25m 1 Colin Patrick McCabe 18/Aug/14 19:04
        Resolved Resolved Closed Closed
        104d 9h 4m 1 Arun C Murthy 01/Dec/14 03:09

          People

          • Assignee:
            James Thomas
            Reporter:
            James Thomas
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development