Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: performance, util
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Got some ideas to improve CRC32 calculation.

      1. c6166_20090722_benchmark_32VM.txt
        21 kB
        Tsz Wo Nicholas Sze
      2. c6166_20090722_benchmark_64VM.txt
        21 kB
        Tsz Wo Nicholas Sze
      3. c6166_20090722.patch
        251 kB
        Tsz Wo Nicholas Sze
      4. c6166_20090727.patch
        306 kB
        Tsz Wo Nicholas Sze
      5. c6166_20090728.patch
        308 kB
        Tsz Wo Nicholas Sze
      6. c6166_20090810.patch
        321 kB
        Tsz Wo Nicholas Sze
      7. c6166_20090811.patch
        310 kB
        Tsz Wo Nicholas Sze
      8. c6166_20090819.patch
        320 kB
        Tsz Wo Nicholas Sze
      9. c6166_20090819review.patch
        53 kB
        Tsz Wo Nicholas Sze
      10. graph.r
        1.0 kB
        Todd Lipcon
      11. graph.r
        1.0 kB
        Todd Lipcon
      12. Rplots.pdf
        153 kB
        Todd Lipcon
      13. Rplots.pdf
        171 kB
        Todd Lipcon
      14. Rplots.pdf
        171 kB
        Todd Lipcon
      15. Rplots-laptop.pdf
        263 kB
        Todd Lipcon
      16. Rplots-nehalem32.pdf
        425 kB
        Todd Lipcon
      17. Rplots-nehalem64.pdf
        667 kB
        Todd Lipcon

        Issue Links

          Activity

          Tsz Wo Nicholas Sze created issue -
          Hide
          Tsz Wo Nicholas Sze added a comment -

          c6166_20090722.patch: Tried a few CRC32 implementations.

          Show
          Tsz Wo Nicholas Sze added a comment - c6166_20090722.patch: Tried a few CRC32 implementations.
          Tsz Wo Nicholas Sze made changes -
          Field Original Value New Value
          Attachment c6166_20090722.patch [ 12414293 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          c6166_20090722_benchmark_64VM.txt, c6166_20090722_benchmark_32VM.txt: Tested the implementations on both 32-bit and 64-bit VMs. Seems that Crc32_4_3 are faster than current PureJavaCrc32 in both cases.

          (Sorry that I still have not tested with TestPureJavaCrc32.PerformanceTest.)

          Show
          Tsz Wo Nicholas Sze added a comment - c6166_20090722_benchmark_64VM.txt, c6166_20090722_benchmark_32VM.txt: Tested the implementations on both 32-bit and 64-bit VMs. Seems that Crc32_4_3 are faster than current PureJavaCrc32 in both cases. (Sorry that I still have not tested with TestPureJavaCrc32.PerformanceTest.)
          Tsz Wo Nicholas Sze made changes -
          Attachment c6166_20090722_benchmark_64VM.txt [ 12414296 ]
          Attachment c6166_20090722_benchmark_32VM.txt [ 12414297 ]
          Tsz Wo Nicholas Sze made changes -
          Link This issue is related to HADOOP-6148 [ HADOOP-6148 ]
          Hide
          Scott Carey added a comment -

          One could take Intel's idea, and go 8 (or 6 or 12?) bytes at a time instead of 4. This would line up with the 12 bit lookup tables a bit better. 6 bytes might be the easier boundary, and require 4 12 bit lookup tables. which is 32K, the size of the tables in the inner loop in your "4_3" are 17K, and the current version is 4K.

          Going 12 bytes at a time would require 8 tables and 64K of space, and then we're randomly jumping around lookup tables that don't fit in a L1 D-cache on some processors.

          The 8 bytes at a time approach is in C code (BSD license) here:
          http://sourceforge.net/projects/slicing-by-8/files/

          The trick with going over 4 bytes in a loop has to do with how the cycle works on the CRC. I think, after the first four bytes of lookups, it changes a bit.
          But I didn't read that much into that code to be sure of what it is doing. They do it with 8 1K lookup tables each of one byte indexes. They also get to directly Xor out of the byte array a single int, rather than grabbing one byte at a time and shifting. We can do that with a ByteBuffer with getInt(), but when I tried that (with getInt) in the previous case the byte buffer creation overhead was too large, and the ByteBuffer access seemed very inefficient for some reason. Oh how nice it would be if you could grab the next 4 bytes in a byte[] as an Int (or the next 8 as a Long) without wrapping it in a ByteBuffer, and let the compiler figure out the optimal processor load instruction.

          I think a 6 byte at a time, 4 lookup of 12 bit LUT would be like this:

          public class Crc32_6_4 extends Crc32Base {
            /** {@inheritDoc} */
            public void update(byte[] b, int off, int len) {
              while(len > 3) {
                crc ^= b[off++] & 0xff;
                crc ^= (b[off++] & 0xff) << 8;
                crc ^= (b[off++] & 0xff) << 16;
                crc ^= (b[off++] & 0xff) << 24;
                int c0 = b[off++] & 0xff;;
                c0 ^= (b[off++] & 0xff) << 8;
          
                crc = T12_3[crc & 0xfff] ^ T12_2[(crc >>> 12) & 0xfff] ^ T12_1[((crc >>> 24) & (c0 << 8)) & 0xfff] ^ T12_0[c0 >> 4];
                len -= 6;
              }
          
              for (; len > 0; len--) {
                crc = (crc >>> 8) ^ Table8_0[(crc ^ b[off++]) & 0xff];
              }
            }
          

          Assuming the tables were built right and the "wrap past 4 bytes means don't xor with crc" is correct. I haven't tried this at all.

          All of these with larger lookup tables run the risk of performing worse under concurrency, even if faster single threaded. Cache pressure is greater under concurrency. We might want to use the benchmark in HADOOP-5318, which is heavily CRC reliant, as a check to make sure we aren't regressing under higher concurrency due to cache pressure.

          For reference, Intel's C code (referenced above, snippet below) with 8 tables looks like this in the inner loop:

          /*++
           *
           * Copyright (c) 2004-2006 Intel Corporation - All Rights Reserved
           *
           * This software program is licensed subject to the BSD License, 
           * available at http://www.opensource.org/licenses/bsd-license.html
           *
           * Abstract: The main routine
           * 
           --*/
          crc32c_sb8_64_bit(
          	uint32_t* p_running_crc,
              const uint8_t*	p_buf,
              const uint32_t length,
          	const uint32_t init_bytes,
          	uint8_t			mode)
          {
          	uint32_t li;
          	uint32_t crc, term1, term2;
          	uint32_t running_length;
          	uint32_t end_bytes;
          	if(mode ==  MODE_CONT)
          		crc = *p_running_crc;
          	else	
          		crc = CRC32C_INIT_REFLECTED;
          	running_length = ((length - init_bytes)/8)*8;
          	end_bytes = length - init_bytes - running_length; 
          
          	for(li=0; li < init_bytes; li++) 
          		crc = crc_tableil8_o32[(crc ^ *p_buf++) & 0x000000FF] ^ (crc >> 8);	
          	for(li=0; li < running_length/8; li++) 
          	{
          		crc ^= *(uint32_t *)p_buf;
          		p_buf += 4;
          		term1 = crc_tableil8_o88[crc & 0x000000FF] ^
          				crc_tableil8_o80[(crc >> 8) & 0x000000FF];
          		term2 = crc >> 16;
          		crc = term1 ^
          			  crc_tableil8_o72[term2 & 0x000000FF] ^ 
          			  crc_tableil8_o64[(term2 >> 8) & 0x000000FF];
          		term1 = crc_tableil8_o56[(*(uint32_t *)p_buf) & 0x000000FF] ^
          				crc_tableil8_o48[((*(uint32_t *)p_buf) >> 8) & 0x000000FF];
          		
          		term2 = (*(uint32_t *)p_buf) >> 16;
          		crc =	crc ^ 
          				term1 ^		
          				crc_tableil8_o40[term2  & 0x000000FF] ^	
          				crc_tableil8_o32[(term2 >> 8) & 0x000000FF];	
          		p_buf += 4;
          	}
          	for(li=0; li < end_bytes; li++) 
          		crc = crc_tableil8_o32[(crc ^ *p_buf++) & 0x000000FF] ^ (crc >> 8);
          	if((mode == MODE_BEGIN) || (mode ==  MODE_CONT))
          		return crc;		
              return crc ^ XOROT;	
          }
          

          That is pretty straightforward to implement if the next 4 tables (T5, T6, T7, T8 in the terminology of the current PureJavaCRC32) are generated the same way as the previous 4.
          Intel makes sure the inner loop is on an 8 byte boundary, because the C compiler can make the load needed for the

          crc ^= *(uint32_t *)p_buf

          part faster if that is the case. They also tend to favor shifting by 16 and 8 and avoiding shifting by 24 for some reason.

          I may try out the eight bytes at once with 8 lookup tables version next week.

          Show
          Scott Carey added a comment - One could take Intel's idea, and go 8 (or 6 or 12?) bytes at a time instead of 4. This would line up with the 12 bit lookup tables a bit better. 6 bytes might be the easier boundary, and require 4 12 bit lookup tables. which is 32K, the size of the tables in the inner loop in your "4_3" are 17K, and the current version is 4K. Going 12 bytes at a time would require 8 tables and 64K of space, and then we're randomly jumping around lookup tables that don't fit in a L1 D-cache on some processors. The 8 bytes at a time approach is in C code (BSD license) here: http://sourceforge.net/projects/slicing-by-8/files/ The trick with going over 4 bytes in a loop has to do with how the cycle works on the CRC. I think, after the first four bytes of lookups, it changes a bit. But I didn't read that much into that code to be sure of what it is doing. They do it with 8 1K lookup tables each of one byte indexes. They also get to directly Xor out of the byte array a single int, rather than grabbing one byte at a time and shifting. We can do that with a ByteBuffer with getInt(), but when I tried that (with getInt) in the previous case the byte buffer creation overhead was too large, and the ByteBuffer access seemed very inefficient for some reason. Oh how nice it would be if you could grab the next 4 bytes in a byte[] as an Int (or the next 8 as a Long) without wrapping it in a ByteBuffer, and let the compiler figure out the optimal processor load instruction. I think a 6 byte at a time, 4 lookup of 12 bit LUT would be like this: public class Crc32_6_4 extends Crc32Base { /** {@inheritDoc} */ public void update( byte [] b, int off, int len) { while (len > 3) { crc ^= b[off++] & 0xff; crc ^= (b[off++] & 0xff) << 8; crc ^= (b[off++] & 0xff) << 16; crc ^= (b[off++] & 0xff) << 24; int c0 = b[off++] & 0xff;; c0 ^= (b[off++] & 0xff) << 8; crc = T12_3[crc & 0xfff] ^ T12_2[(crc >>> 12) & 0xfff] ^ T12_1[((crc >>> 24) & (c0 << 8)) & 0xfff] ^ T12_0[c0 >> 4]; len -= 6; } for (; len > 0; len--) { crc = (crc >>> 8) ^ Table8_0[(crc ^ b[off++]) & 0xff]; } } Assuming the tables were built right and the "wrap past 4 bytes means don't xor with crc" is correct. I haven't tried this at all. All of these with larger lookup tables run the risk of performing worse under concurrency, even if faster single threaded. Cache pressure is greater under concurrency. We might want to use the benchmark in HADOOP-5318 , which is heavily CRC reliant, as a check to make sure we aren't regressing under higher concurrency due to cache pressure. For reference, Intel's C code (referenced above, snippet below) with 8 tables looks like this in the inner loop: /*++ * * Copyright (c) 2004-2006 Intel Corporation - All Rights Reserved * * This software program is licensed subject to the BSD License, * available at http: //www.opensource.org/licenses/bsd-license.html * * Abstract: The main routine * --*/ crc32c_sb8_64_bit( uint32_t* p_running_crc, const uint8_t* p_buf, const uint32_t length, const uint32_t init_bytes, uint8_t mode) { uint32_t li; uint32_t crc, term1, term2; uint32_t running_length; uint32_t end_bytes; if (mode == MODE_CONT) crc = *p_running_crc; else crc = CRC32C_INIT_REFLECTED; running_length = ((length - init_bytes)/8)*8; end_bytes = length - init_bytes - running_length; for (li=0; li < init_bytes; li++) crc = crc_tableil8_o32[(crc ^ *p_buf++) & 0x000000FF] ^ (crc >> 8); for (li=0; li < running_length/8; li++) { crc ^= *(uint32_t *)p_buf; p_buf += 4; term1 = crc_tableil8_o88[crc & 0x000000FF] ^ crc_tableil8_o80[(crc >> 8) & 0x000000FF]; term2 = crc >> 16; crc = term1 ^ crc_tableil8_o72[term2 & 0x000000FF] ^ crc_tableil8_o64[(term2 >> 8) & 0x000000FF]; term1 = crc_tableil8_o56[(*(uint32_t *)p_buf) & 0x000000FF] ^ crc_tableil8_o48[((*(uint32_t *)p_buf) >> 8) & 0x000000FF]; term2 = (*(uint32_t *)p_buf) >> 16; crc = crc ^ term1 ^ crc_tableil8_o40[term2 & 0x000000FF] ^ crc_tableil8_o32[(term2 >> 8) & 0x000000FF]; p_buf += 4; } for (li=0; li < end_bytes; li++) crc = crc_tableil8_o32[(crc ^ *p_buf++) & 0x000000FF] ^ (crc >> 8); if ((mode == MODE_BEGIN) || (mode == MODE_CONT)) return crc; return crc ^ XOROT; } That is pretty straightforward to implement if the next 4 tables (T5, T6, T7, T8 in the terminology of the current PureJavaCRC32) are generated the same way as the previous 4. Intel makes sure the inner loop is on an 8 byte boundary, because the C compiler can make the load needed for the crc ^= *(uint32_t *)p_buf part faster if that is the case. They also tend to favor shifting by 16 and 8 and avoiding shifting by 24 for some reason. I may try out the eight bytes at once with 8 lookup tables version next week.
          Tsz Wo Nicholas Sze made changes -
          Link This issue is related to HADOOP-5318 [ HADOOP-5318 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Yeah, the speed depends on many platform dependent details like cache size, 32-bit/64-bit, etc. So reducing the number of operations in the CRC algorithm may not lead to a better performance.

          I tried more varieties like Crc32_6_4 but my implementation did not perform well. We should run benchmark on Scott's.

          Show
          Tsz Wo Nicholas Sze added a comment - Yeah, the speed depends on many platform dependent details like cache size, 32-bit/64-bit, etc. So reducing the number of operations in the CRC algorithm may not lead to a better performance. I tried more varieties like Crc32_6_4 but my implementation did not perform well. We should run benchmark on Scott's.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Unfortunately, Crc32_4_3 only wins on a 32-bit vm over TestPureJavaCrc32.PerformanceTest but not 64-bit vm.

          • 32-bit vm
            num bytes CRC32 MB/sec Crc32_4_2 MB/sec Crc32_4_3 MB/sec Crc32_3_2 MB/sec PureJavaCrc32 MB/sec
            1 4.504 52.409 55.779 46.603 59.146
            2 8.825 86.590 87.938 77.267 89.942
            4 17.254 119.824 151.929 120.808 146.983
            8 32.037 147.222 202.527 161.984 174.844
            16 59.078 161.879 231.467 195.635 228.018
            32 100.267 176.767 276.844 241.295 244.502
            64 148.985 178.250 283.511 269.368 263.209
            128 199.763 185.639 294.116 271.943 259.021
            256 232.751 179.525 290.357 259.453 256.891
            512 255.430 178.217 296.907 280.763 257.362
            1024 262.274 172.033 289.806 277.863 261.793
            2048 273.744 187.468 299.271 286.387 272.611
            4096 289.373 186.306 293.845 276.021 266.067
            8192 290.282 184.625 296.723 285.097 271.503
            16384 298.959 180.863 291.081 250.583 199.536
            32768 277.718 184.078 293.156 285.722 270.377
            65536 300.016 186.439 298.946 283.990 271.268
            131072 298.971 186.754 298.417 283.949 268.240
            262144 299.688 184.124 296.014 281.633 265.799
            524288 282.488 176.217 288.120 284.030 267.917
            1048576 294.852 185.167 291.499 279.362 267.438
            2097152 296.117 174.667 281.145 272.180 260.837
            4194304 283.934 173.777 279.931 271.393 259.805
            8388608 289.455 177.829 291.535 269.850 259.513
            16777216 284.204 177.449 290.489 276.586 265.657
          • 64-bit vm
            num bytes CRC32 MB/sec Crc32_4_2 MB/sec Crc32_4_3 MB/sec Crc32_3_2 MB/sec PureJavaCrc32 MB/sec
            1 7.636 80.107 99.658 77.283 34.446
            2 14.598 116.202 110.091 94.056 106.498
            4 27.786 152.932 197.294 147.532 175.766
            8 50.159 153.598 194.617 163.596 197.350
            16 85.036 177.761 258.683 237.917 278.764
            32 130.278 180.486 310.024 281.343 342.374
            64 177.501 181.663 343.592 320.385 384.938
            128 217.128 181.836 366.965 338.893 411.724
            256 245.690 182.637 379.003 348.981 425.874
            512 262.085 181.103 381.961 355.506 428.103
            1024 271.307 179.753 381.658 356.488 433.800
            2048 276.640 180.451 378.667 351.067 437.275
            4096 278.435 179.881 372.762 347.728 437.209
            8192 279.883 180.776 377.241 351.178 439.571
            16384 281.385 180.775 377.493 353.361 439.606
            32768 281.434 180.703 378.047 353.656 438.703
            65536 281.354 180.914 377.805 353.130 437.152
            131072 280.941 180.288 377.164 353.340 438.806
            262144 282.056 180.910 378.514 354.320 438.208
            524288 281.066 180.177 377.148 352.832 437.183
            1048576 281.668 180.790 378.412 354.059 438.755
            2097152 282.162 180.545 377.918 353.841 438.497
            4194304 281.379 179.018 376.287 352.240 436.963
            8388608 279.929 178.058 371.618 349.405 430.993
            16777216 278.974 177.577 371.729 347.971 429.326
          Show
          Tsz Wo Nicholas Sze added a comment - Unfortunately, Crc32_4_3 only wins on a 32-bit vm over TestPureJavaCrc32.PerformanceTest but not 64-bit vm. 32-bit vm num bytes CRC32 MB/sec Crc32_4_2 MB/sec Crc32_4_3 MB/sec Crc32_3_2 MB/sec PureJavaCrc32 MB/sec 1 4.504 52.409 55.779 46.603 59.146 2 8.825 86.590 87.938 77.267 89.942 4 17.254 119.824 151.929 120.808 146.983 8 32.037 147.222 202.527 161.984 174.844 16 59.078 161.879 231.467 195.635 228.018 32 100.267 176.767 276.844 241.295 244.502 64 148.985 178.250 283.511 269.368 263.209 128 199.763 185.639 294.116 271.943 259.021 256 232.751 179.525 290.357 259.453 256.891 512 255.430 178.217 296.907 280.763 257.362 1024 262.274 172.033 289.806 277.863 261.793 2048 273.744 187.468 299.271 286.387 272.611 4096 289.373 186.306 293.845 276.021 266.067 8192 290.282 184.625 296.723 285.097 271.503 16384 298.959 180.863 291.081 250.583 199.536 32768 277.718 184.078 293.156 285.722 270.377 65536 300.016 186.439 298.946 283.990 271.268 131072 298.971 186.754 298.417 283.949 268.240 262144 299.688 184.124 296.014 281.633 265.799 524288 282.488 176.217 288.120 284.030 267.917 1048576 294.852 185.167 291.499 279.362 267.438 2097152 296.117 174.667 281.145 272.180 260.837 4194304 283.934 173.777 279.931 271.393 259.805 8388608 289.455 177.829 291.535 269.850 259.513 16777216 284.204 177.449 290.489 276.586 265.657 64-bit vm num bytes CRC32 MB/sec Crc32_4_2 MB/sec Crc32_4_3 MB/sec Crc32_3_2 MB/sec PureJavaCrc32 MB/sec 1 7.636 80.107 99.658 77.283 34.446 2 14.598 116.202 110.091 94.056 106.498 4 27.786 152.932 197.294 147.532 175.766 8 50.159 153.598 194.617 163.596 197.350 16 85.036 177.761 258.683 237.917 278.764 32 130.278 180.486 310.024 281.343 342.374 64 177.501 181.663 343.592 320.385 384.938 128 217.128 181.836 366.965 338.893 411.724 256 245.690 182.637 379.003 348.981 425.874 512 262.085 181.103 381.961 355.506 428.103 1024 271.307 179.753 381.658 356.488 433.800 2048 276.640 180.451 378.667 351.067 437.275 4096 278.435 179.881 372.762 347.728 437.209 8192 279.883 180.776 377.241 351.178 439.571 16384 281.385 180.775 377.493 353.361 439.606 32768 281.434 180.703 378.047 353.656 438.703 65536 281.354 180.914 377.805 353.130 437.152 131072 280.941 180.288 377.164 353.340 438.806 262144 282.056 180.910 378.514 354.320 438.208 524288 281.066 180.177 377.148 352.832 437.183 1048576 281.668 180.790 378.412 354.059 438.755 2097152 282.162 180.545 377.918 353.841 438.497 4194304 281.379 179.018 376.287 352.240 436.963 8388608 279.929 178.058 371.618 349.405 430.993 16777216 278.974 177.577 371.729 347.971 429.326
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Not yet able to improve PureJavaCrc32 in my 64-bit machine but had a lot of fun last weekends.

          c6166_20090727.patch: moved the codes to common (finally). Please try it when you have time.

          • 64-bit
            java.version = 1.6.0_10
            java.runtime.name = Java(TM) SE Runtime Environment
            java.runtime.version = 1.6.0_10-b33
            java.vm.version = 11.0-b15
            java.vm.vendor = Sun Microsystems Inc.
            java.vm.name = Java HotSpot(TM) 64-Bit Server VM
            java.vm.specification.version = 1.0
            java.specification.version = 1.6
            os.arch = amd64
            os.name = Linux
            os.version = 2.6.9-55.ELsmp
            num bytes PureJavaCrc32 MB/sec PureJavaCrc32New MB/sec Crc32_3_2 MB/sec Crc32_4_3 MB/sec Crc32_5_5 MB/sec Crc32_6_6 MB/sec Crc32_8_8 MB/sec Crc32_12_12 MB/sec
            8 157.986 102.628 135.926 204.207 218.584 239.862 253.886 213.072
            16 245.381 214.363 238.798 202.284 261.246 207.342 219.935 245.648
            32 331.296 290.766 283.689 218.582 310.499 329.800 283.757 273.074
            64 405.822 345.573 325.067 224.623 344.732 345.010 311.538 346.970
            128 451.240 378.875 343.498 226.853 391.824 392.462 323.504 391.573
            256 479.728 410.574 352.432 226.939 416.448 396.344 331.537 415.233
            512 488.917 425.214 355.640 227.120 424.109 409.781 335.965 427.655
            1024 499.820 433.135 358.441 225.953 430.212 414.286 337.652 431.440
            2048 504.199 438.373 352.913 223.754 435.921 417.888 339.190 438.454
            4096 509.100 441.553 351.305 222.355 438.667 420.657 341.057 441.063
            8192 511.632 439.242 352.058 222.469 439.568 422.009 341.427 447.972
            16384 510.829 444.631 354.097 222.488 439.707 419.661 341.200 451.286
            32768 507.353 437.758 354.601 222.503 436.775 416.266 339.704 449.830
            65536 507.335 434.042 354.837 222.682 436.742 417.825 339.761 449.868
            131072 507.473 431.449 355.014 222.748 436.477 417.910 339.835 449.958
            262144 507.548 429.451 354.932 222.632 436.698 417.783 339.852 449.936
            524288 507.322 428.618 355.142 222.491 436.584 417.715 339.826 450.146
            1048576 507.148 428.778 354.769 222.534 436.506 417.819 339.830 450.032
            2097152 506.610 432.981 354.596 222.261 436.080 417.573 339.933 449.623
            4194304 504.503 432.501 352.918 221.669 432.956 414.626 337.668 445.489
            8388608 498.208 428.943 348.488 219.868 429.455 411.497 335.448 440.899
            16777216 497.184 423.245 348.105 219.657 427.288 410.788 334.992 440.603
          • 32-bit
            java.version = 1.6.0_14
            java.runtime.name = Java(TM) SE Runtime Environment
            java.runtime.version = 1.6.0_14-b08
            java.vm.version = 14.0-b16
            java.vm.vendor = Sun Microsystems Inc.
            java.vm.name = Java HotSpot(TM) Client VM
            java.vm.specification.version = 1.0
            java.specification.version = 1.6
            os.arch = x86
            os.name = Windows XP
            os.version = 5.1
            num bytes PureJavaCrc32 MB/sec PureJavaCrc32New MB/sec Crc32_3_2 MB/sec Crc32_4_3 MB/sec Crc32_5_5 MB/sec Crc32_6_6 MB/sec Crc32_8_8 MB/sec Crc32_12_12 MB/sec
            8 192.776 167.821 174.386 207.658 184.504 196.480 222.191 155.052
            16 227.810 212.454 224.370 250.812 236.371 229.370 267.719 248.392
            32 250.230 239.881 251.688 280.112 257.619 270.357 298.951 268.026
            64 263.695 257.785 275.683 296.554 269.474 284.615 316.642 319.204
            128 270.873 266.500 286.942 305.744 282.172 298.166 325.899 325.806
            256 282.205 270.751 294.155 306.224 288.460 303.445 330.728 342.959
            512 282.529 270.063 295.103 309.134 290.043 300.008 332.325 343.466
            1024 279.710 273.680 298.489 308.905 289.993 305.850 331.706 347.761
            2048 279.753 274.073 293.911 304.972 285.538 308.030 334.459 350.296
            4096 278.830 275.688 290.520 302.634 293.201 308.455 334.498 351.761
            8192 279.518 274.829 289.088 299.036 292.383 305.940 333.512 352.584
            16384 278.609 251.000 287.964 303.862 293.782 308.534 333.397 347.889
            32768 276.124 272.805 290.125 300.458 289.993 306.447 334.985 350.833
            65536 274.212 273.606 288.872 303.673 286.457 307.196 332.591 349.273
            131072 275.371 272.257 289.985 303.490 289.126 303.128 330.630 349.575
            262144 275.607 273.878 288.080 302.439 285.862 304.965 330.775 347.044
            524288 274.578 270.549 286.745 299.832 287.063 304.299 332.160 346.131
            1048576 270.002 272.333 285.866 299.702 284.005 304.845 329.455 343.377
            2097152 268.254 265.650 285.905 297.428 286.168 302.749 329.962 344.515
            4194304 272.093 268.692 285.552 299.619 285.262 299.311 327.311 338.847
            8388608 268.156 265.971 283.967 291.225 282.974 301.162 323.698 343.482
            16777216 271.428 270.893 285.939 299.171 284.946 302.520 328.465 343.694
          Show
          Tsz Wo Nicholas Sze added a comment - Not yet able to improve PureJavaCrc32 in my 64-bit machine but had a lot of fun last weekends. c6166_20090727.patch: moved the codes to common (finally). Please try it when you have time. 64-bit java.version = 1.6.0_10 java.runtime.name = Java(TM) SE Runtime Environment java.runtime.version = 1.6.0_10-b33 java.vm.version = 11.0-b15 java.vm.vendor = Sun Microsystems Inc. java.vm.name = Java HotSpot(TM) 64-Bit Server VM java.vm.specification.version = 1.0 java.specification.version = 1.6 os.arch = amd64 os.name = Linux os.version = 2.6.9-55.ELsmp num bytes PureJavaCrc32 MB/sec PureJavaCrc32New MB/sec Crc32_3_2 MB/sec Crc32_4_3 MB/sec Crc32_5_5 MB/sec Crc32_6_6 MB/sec Crc32_8_8 MB/sec Crc32_12_12 MB/sec 8 157.986 102.628 135.926 204.207 218.584 239.862 253.886 213.072 16 245.381 214.363 238.798 202.284 261.246 207.342 219.935 245.648 32 331.296 290.766 283.689 218.582 310.499 329.800 283.757 273.074 64 405.822 345.573 325.067 224.623 344.732 345.010 311.538 346.970 128 451.240 378.875 343.498 226.853 391.824 392.462 323.504 391.573 256 479.728 410.574 352.432 226.939 416.448 396.344 331.537 415.233 512 488.917 425.214 355.640 227.120 424.109 409.781 335.965 427.655 1024 499.820 433.135 358.441 225.953 430.212 414.286 337.652 431.440 2048 504.199 438.373 352.913 223.754 435.921 417.888 339.190 438.454 4096 509.100 441.553 351.305 222.355 438.667 420.657 341.057 441.063 8192 511.632 439.242 352.058 222.469 439.568 422.009 341.427 447.972 16384 510.829 444.631 354.097 222.488 439.707 419.661 341.200 451.286 32768 507.353 437.758 354.601 222.503 436.775 416.266 339.704 449.830 65536 507.335 434.042 354.837 222.682 436.742 417.825 339.761 449.868 131072 507.473 431.449 355.014 222.748 436.477 417.910 339.835 449.958 262144 507.548 429.451 354.932 222.632 436.698 417.783 339.852 449.936 524288 507.322 428.618 355.142 222.491 436.584 417.715 339.826 450.146 1048576 507.148 428.778 354.769 222.534 436.506 417.819 339.830 450.032 2097152 506.610 432.981 354.596 222.261 436.080 417.573 339.933 449.623 4194304 504.503 432.501 352.918 221.669 432.956 414.626 337.668 445.489 8388608 498.208 428.943 348.488 219.868 429.455 411.497 335.448 440.899 16777216 497.184 423.245 348.105 219.657 427.288 410.788 334.992 440.603 32-bit java.version = 1.6.0_14 java.runtime.name = Java(TM) SE Runtime Environment java.runtime.version = 1.6.0_14-b08 java.vm.version = 14.0-b16 java.vm.vendor = Sun Microsystems Inc. java.vm.name = Java HotSpot(TM) Client VM java.vm.specification.version = 1.0 java.specification.version = 1.6 os.arch = x86 os.name = Windows XP os.version = 5.1 num bytes PureJavaCrc32 MB/sec PureJavaCrc32New MB/sec Crc32_3_2 MB/sec Crc32_4_3 MB/sec Crc32_5_5 MB/sec Crc32_6_6 MB/sec Crc32_8_8 MB/sec Crc32_12_12 MB/sec 8 192.776 167.821 174.386 207.658 184.504 196.480 222.191 155.052 16 227.810 212.454 224.370 250.812 236.371 229.370 267.719 248.392 32 250.230 239.881 251.688 280.112 257.619 270.357 298.951 268.026 64 263.695 257.785 275.683 296.554 269.474 284.615 316.642 319.204 128 270.873 266.500 286.942 305.744 282.172 298.166 325.899 325.806 256 282.205 270.751 294.155 306.224 288.460 303.445 330.728 342.959 512 282.529 270.063 295.103 309.134 290.043 300.008 332.325 343.466 1024 279.710 273.680 298.489 308.905 289.993 305.850 331.706 347.761 2048 279.753 274.073 293.911 304.972 285.538 308.030 334.459 350.296 4096 278.830 275.688 290.520 302.634 293.201 308.455 334.498 351.761 8192 279.518 274.829 289.088 299.036 292.383 305.940 333.512 352.584 16384 278.609 251.000 287.964 303.862 293.782 308.534 333.397 347.889 32768 276.124 272.805 290.125 300.458 289.993 306.447 334.985 350.833 65536 274.212 273.606 288.872 303.673 286.457 307.196 332.591 349.273 131072 275.371 272.257 289.985 303.490 289.126 303.128 330.630 349.575 262144 275.607 273.878 288.080 302.439 285.862 304.965 330.775 347.044 524288 274.578 270.549 286.745 299.832 287.063 304.299 332.160 346.131 1048576 270.002 272.333 285.866 299.702 284.005 304.845 329.455 343.377 2097152 268.254 265.650 285.905 297.428 286.168 302.749 329.962 344.515 4194304 272.093 268.692 285.552 299.619 285.262 299.311 327.311 338.847 8388608 268.156 265.971 283.967 291.225 282.974 301.162 323.698 343.482 16777216 271.428 270.893 285.939 299.171 284.946 302.520 328.465 343.694
          Tsz Wo Nicholas Sze made changes -
          Attachment c6166_20090727.patch [ 12414652 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Got a different story after updated to the latest jdk.

          java.version = 1.6.0_14
          java.runtime.name = Java(TM) SE Runtime Environment
          java.runtime.version = 1.6.0_14-b08
          java.vm.version = 14.0-b16
          java.vm.vendor = Sun Microsystems Inc.
          java.vm.name = Java HotSpot(TM) 64-Bit Server VM
          java.vm.specification.version = 1.0
          java.specification.version = 1.6
          os.arch = amd64
          os.name = Linux
          os.version = 2.6.9-55.ELsmp

          num bytes PureJavaCrc32 MB/sec PureJavaCrc32New MB/sec Crc32_3_2 MB/sec Crc32_4_3 MB/sec Crc32_5_5 MB/sec Crc32_6_6 MB/sec Crc32_8_8 MB/sec Crc32_12_12 MB/sec
          8 190.202 170.445 170.560 228.872 215.769 206.454 231.685 214.802
          16 257.234 209.434 225.336 267.450 263.165 187.785 253.917 232.935
          32 309.992 271.358 243.099 309.840 309.997 304.166 319.013 270.716
          64 348.461 326.343 265.435 338.049 330.947 333.372 358.959 334.240
          128 369.745 362.989 271.531 354.615 382.880 371.822 383.919 370.870
          256 382.773 385.201 279.028 364.379 402.521 378.506 400.110 399.904
          512 384.597 397.015 279.898 364.408 406.135 389.814 407.351 405.674
          1024 390.181 405.577 281.035 364.413 412.159 395.506 408.046 416.599
          2048 392.820 408.548 275.382 360.259 412.941 395.982 409.498 422.795
          4096 392.362 408.593 273.375 355.012 414.489 397.885 410.857 422.115
          8192 393.152 409.355 273.973 355.846 415.358 398.532 411.701 423.370
          16384 393.094 409.406 274.759 355.500 415.657 398.542 411.813 422.864
          32768 392.515 408.989 276.169 357.135 415.965 400.295 411.622 422.887
          65536 393.323 408.997 276.594 357.448 416.075 400.896 411.966 422.850
          131072 393.531 408.982 276.566 357.490 416.059 400.959 412.037 422.953
          262144 393.638 409.070 276.585 357.407 416.030 401.040 412.046 423.034
          524288 393.629 408.982 276.511 357.462 416.123 400.994 411.924 423.010
          1048576 393.652 408.943 276.397 357.050 415.844 400.785 411.927 422.808
          2097152 393.408 408.558 276.024 356.452 415.633 400.296 411.594 422.426
          4194304 391.575 405.148 275.157 354.772 413.680 397.834 409.809 420.100
          8388608 389.204 404.179 273.648 351.896 411.007 395.309 407.030 417.661
          16777216 388.753 403.422 273.343 351.298 410.380 394.783 406.396 416.995

          The above table makes more sense since it is easy to tell from the codes that Crc32_N_N for N > 4 is more efficient than PureJavaCrc32 (i.e. Crc32_4_4). Note that N cannot be increased arbitrary. Otherwise, the tables may not fit into the cpu cache as explained previously by Scott. (Tried Crc32_16_16 but it got worst.)

          As shown above, Crc32_12_12 has 7% and 26% improvement on my 64-bit and 32-bit machines with jdk 1.6.0_14-b08, respectively. I cannot explain why the numbers were generally better in 1.6.0_10-b33, 64-bit vm. Specific jdk feature/bug?

          Show
          Tsz Wo Nicholas Sze added a comment - Got a different story after updated to the latest jdk. java.version = 1.6.0_14 java.runtime.name = Java(TM) SE Runtime Environment java.runtime.version = 1.6.0_14-b08 java.vm.version = 14.0-b16 java.vm.vendor = Sun Microsystems Inc. java.vm.name = Java HotSpot(TM) 64-Bit Server VM java.vm.specification.version = 1.0 java.specification.version = 1.6 os.arch = amd64 os.name = Linux os.version = 2.6.9-55.ELsmp num bytes PureJavaCrc32 MB/sec PureJavaCrc32New MB/sec Crc32_3_2 MB/sec Crc32_4_3 MB/sec Crc32_5_5 MB/sec Crc32_6_6 MB/sec Crc32_8_8 MB/sec Crc32_12_12 MB/sec 8 190.202 170.445 170.560 228.872 215.769 206.454 231.685 214.802 16 257.234 209.434 225.336 267.450 263.165 187.785 253.917 232.935 32 309.992 271.358 243.099 309.840 309.997 304.166 319.013 270.716 64 348.461 326.343 265.435 338.049 330.947 333.372 358.959 334.240 128 369.745 362.989 271.531 354.615 382.880 371.822 383.919 370.870 256 382.773 385.201 279.028 364.379 402.521 378.506 400.110 399.904 512 384.597 397.015 279.898 364.408 406.135 389.814 407.351 405.674 1024 390.181 405.577 281.035 364.413 412.159 395.506 408.046 416.599 2048 392.820 408.548 275.382 360.259 412.941 395.982 409.498 422.795 4096 392.362 408.593 273.375 355.012 414.489 397.885 410.857 422.115 8192 393.152 409.355 273.973 355.846 415.358 398.532 411.701 423.370 16384 393.094 409.406 274.759 355.500 415.657 398.542 411.813 422.864 32768 392.515 408.989 276.169 357.135 415.965 400.295 411.622 422.887 65536 393.323 408.997 276.594 357.448 416.075 400.896 411.966 422.850 131072 393.531 408.982 276.566 357.490 416.059 400.959 412.037 422.953 262144 393.638 409.070 276.585 357.407 416.030 401.040 412.046 423.034 524288 393.629 408.982 276.511 357.462 416.123 400.994 411.924 423.010 1048576 393.652 408.943 276.397 357.050 415.844 400.785 411.927 422.808 2097152 393.408 408.558 276.024 356.452 415.633 400.296 411.594 422.426 4194304 391.575 405.148 275.157 354.772 413.680 397.834 409.809 420.100 8388608 389.204 404.179 273.648 351.896 411.007 395.309 407.030 417.661 16777216 388.753 403.422 273.343 351.298 410.380 394.783 406.396 416.995 The above table makes more sense since it is easy to tell from the codes that Crc32_N_N for N > 4 is more efficient than PureJavaCrc32 (i.e. Crc32_4_4). Note that N cannot be increased arbitrary. Otherwise, the tables may not fit into the cpu cache as explained previously by Scott. (Tried Crc32_16_16 but it got worst.) As shown above, Crc32_12_12 has 7% and 26% improvement on my 64-bit and 32-bit machines with jdk 1.6.0_14-b08, respectively. I cannot explain why the numbers were generally better in 1.6.0_10-b33, 64-bit vm. Specific jdk feature/bug?
          Hide
          Scott Carey added a comment -

          For the 32 bit results, try passing -server on the command line. It behaves quite differently with loop unrolling and certain low level optimizations in the JIT versus -client (which is only default on 32 bit windows, and anyone who would run Hadoop there and wanted better performance would pass -server to speed it up).

          Are you specifying a -Xmx memory value? What about -Xms? On windows with -client, the VM has unusual default memory and GC values, I've found that setting its NewRatio more like the other platforms helps a lot: -XX:NewRatio=4 or something like that may make your results more consistent across the platforms (and faster on 32 bit windows).

          On my environment, on the previous set of tests, changing from _10 to _12 to _14 on JDK6 did not seem to do much. But I was manually setting -Xmx512m for all of my tests. I can try again later, but there is something odd about the results slowing down so much on the 1.6.0_14 version.

          It is also curious that the PureJavaCrc32New – which only changes the loop style --also slows down but not as much as the older PureJavaCrc32 and goes from always about 15% slower to a little bit faster. My guess is something configuration related has changed with respect to some default JVM settings.

          I think there may be some improvement possible in the 8_8 case in how the 9 XORs at the end are done. Perhaps all in one line? or in 3 sets of 3? Or more likely the compiler is smart enough to do the register optimization itself? Perhaps not, Intel's C code even avoids a single line with more than 4 XORs at once for some reason.

          Show
          Scott Carey added a comment - For the 32 bit results, try passing -server on the command line. It behaves quite differently with loop unrolling and certain low level optimizations in the JIT versus -client (which is only default on 32 bit windows, and anyone who would run Hadoop there and wanted better performance would pass -server to speed it up). Are you specifying a -Xmx memory value? What about -Xms? On windows with -client, the VM has unusual default memory and GC values, I've found that setting its NewRatio more like the other platforms helps a lot: -XX:NewRatio=4 or something like that may make your results more consistent across the platforms (and faster on 32 bit windows). On my environment, on the previous set of tests, changing from _10 to _12 to _14 on JDK6 did not seem to do much. But I was manually setting -Xmx512m for all of my tests. I can try again later, but there is something odd about the results slowing down so much on the 1.6.0_14 version. It is also curious that the PureJavaCrc32New – which only changes the loop style --also slows down but not as much as the older PureJavaCrc32 and goes from always about 15% slower to a little bit faster. My guess is something configuration related has changed with respect to some default JVM settings. I think there may be some improvement possible in the 8_8 case in how the 9 XORs at the end are done. Perhaps all in one line? or in 3 sets of 3? Or more likely the compiler is smart enough to do the register optimization itself? Perhaps not, Intel's C code even avoids a single line with more than 4 XORs at once for some reason.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          c6166_20090728.patch: included Crc32_16_16

          > For the 32 bit results, try passing -server on the command line. ...
          Here is the result:

          java.version = 1.6.0_14
          java.runtime.name = Java(TM) SE Runtime Environment
          java.runtime.version = 1.6.0_14-b08
          java.vm.version = 14.0-b16
          java.vm.vendor = Sun Microsystems Inc.
          java.vm.name = Java HotSpot(TM) Server VM
          java.vm.specification.version = 1.0
          java.specification.version = 1.6
          os.arch = x86
          os.name = Windows XP
          os.version = 5.1

          num bytes PureJavaCrc32 MB/sec PureJavaCrc32New MB/sec Crc32_3_2 MB/sec Crc32_4_3 MB/sec Crc32_5_5 MB/sec Crc32_6_6 MB/sec Crc32_8_8 MB/sec Crc32_12_12 MB/sec Crc32_16_16 MB/sec
          8 138.935 148.510 133.888 174.420 142.309 148.559 202.270 125.889 117.607
          16 195.238 179.688 194.082 196.024 202.448 174.408 231.516 181.476 249.847
          32 239.042 212.647 218.873 214.975 238.313 234.569 285.546 222.713 282.443
          64 267.240 236.977 248.711 224.998 272.373 259.976 314.990 268.683 306.000
          128 282.564 261.874 258.325 195.558 183.524 290.901 339.453 307.891 285.557
          256 286.647 271.146 270.484 224.961 288.691 307.519 352.148 337.360 312.192
          512 298.539 276.192 274.773 236.895 336.279 315.217 361.232 346.809 319.615
          1024 303.658 279.882 276.542 236.183 340.919 325.135 364.909 352.689 319.080
          2048 309.358 285.787 273.328 236.416 345.868 327.777 368.106 357.019 321.033
          4096 306.306 285.192 272.680 237.541 343.045 327.025 368.837 358.270 322.088
          8192 307.772 288.171 272.977 237.316 348.833 328.908 373.525 361.827 322.454
          16384 307.900 286.654 273.482 236.011 332.936 328.303 370.397 359.706 320.460
          32768 302.599 285.929 273.000 237.496 343.129 328.161 368.144 360.141 320.854
          65536 305.564 285.796 273.027 236.645 342.567 329.054 369.318 360.611 322.333
          131072 306.763 285.466 274.336 237.648 344.286 329.910 373.027 360.100 320.236
          262144 302.322 286.444 273.267 236.971 345.512 327.882 370.549 358.936 320.964
          524288 304.555 284.659 272.150 235.174 342.026 327.074 369.213 359.436 316.547
          1048576 301.722 279.686 271.529 235.130 338.665 324.743 365.818 352.513 315.451
          2097152 301.360 282.853 270.846 232.843 336.175 322.065 362.790 356.372 317.965
          4194304 298.921 283.021 269.376 233.498 336.376 321.957 365.402 354.546 299.699
          8388608 250.164 281.916 269.071 234.353 338.636 325.124 365.995 353.549 312.460
          16777216 290.762 264.850 270.366 235.145 338.756 321.101 364.583 353.767 316.974

          > Are you specifying a -Xmx memory value? What about -Xms?
          I have -Xmx512m but no -Xms. Any suggestion?

          > It is also curious that the PureJavaCrc32New - which only changes the loop style ...
          This trick does not always work: PureJavaCrc32New was slower in the results shown above.

          > I think there may be some improvement possible in the 8_8 case in how the 9 XORs at the end are done. ...
          Yeah, we should try.

          Thanks, Scott.

          Show
          Tsz Wo Nicholas Sze added a comment - c6166_20090728.patch: included Crc32_16_16 > For the 32 bit results, try passing -server on the command line. ... Here is the result: java.version = 1.6.0_14 java.runtime.name = Java(TM) SE Runtime Environment java.runtime.version = 1.6.0_14-b08 java.vm.version = 14.0-b16 java.vm.vendor = Sun Microsystems Inc. java.vm.name = Java HotSpot(TM) Server VM java.vm.specification.version = 1.0 java.specification.version = 1.6 os.arch = x86 os.name = Windows XP os.version = 5.1 num bytes PureJavaCrc32 MB/sec PureJavaCrc32New MB/sec Crc32_3_2 MB/sec Crc32_4_3 MB/sec Crc32_5_5 MB/sec Crc32_6_6 MB/sec Crc32_8_8 MB/sec Crc32_12_12 MB/sec Crc32_16_16 MB/sec 8 138.935 148.510 133.888 174.420 142.309 148.559 202.270 125.889 117.607 16 195.238 179.688 194.082 196.024 202.448 174.408 231.516 181.476 249.847 32 239.042 212.647 218.873 214.975 238.313 234.569 285.546 222.713 282.443 64 267.240 236.977 248.711 224.998 272.373 259.976 314.990 268.683 306.000 128 282.564 261.874 258.325 195.558 183.524 290.901 339.453 307.891 285.557 256 286.647 271.146 270.484 224.961 288.691 307.519 352.148 337.360 312.192 512 298.539 276.192 274.773 236.895 336.279 315.217 361.232 346.809 319.615 1024 303.658 279.882 276.542 236.183 340.919 325.135 364.909 352.689 319.080 2048 309.358 285.787 273.328 236.416 345.868 327.777 368.106 357.019 321.033 4096 306.306 285.192 272.680 237.541 343.045 327.025 368.837 358.270 322.088 8192 307.772 288.171 272.977 237.316 348.833 328.908 373.525 361.827 322.454 16384 307.900 286.654 273.482 236.011 332.936 328.303 370.397 359.706 320.460 32768 302.599 285.929 273.000 237.496 343.129 328.161 368.144 360.141 320.854 65536 305.564 285.796 273.027 236.645 342.567 329.054 369.318 360.611 322.333 131072 306.763 285.466 274.336 237.648 344.286 329.910 373.027 360.100 320.236 262144 302.322 286.444 273.267 236.971 345.512 327.882 370.549 358.936 320.964 524288 304.555 284.659 272.150 235.174 342.026 327.074 369.213 359.436 316.547 1048576 301.722 279.686 271.529 235.130 338.665 324.743 365.818 352.513 315.451 2097152 301.360 282.853 270.846 232.843 336.175 322.065 362.790 356.372 317.965 4194304 298.921 283.021 269.376 233.498 336.376 321.957 365.402 354.546 299.699 8388608 250.164 281.916 269.071 234.353 338.636 325.124 365.995 353.549 312.460 16777216 290.762 264.850 270.366 235.145 338.756 321.101 364.583 353.767 316.974 > Are you specifying a -Xmx memory value? What about -Xms? I have -Xmx512m but no -Xms. Any suggestion? > It is also curious that the PureJavaCrc32New - which only changes the loop style ... This trick does not always work: PureJavaCrc32New was slower in the results shown above. > I think there may be some improvement possible in the 8_8 case in how the 9 XORs at the end are done. ... Yeah, we should try. Thanks, Scott.
          Tsz Wo Nicholas Sze made changes -
          Attachment c6166_20090728.patch [ 12414783 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          >> I think there may be some improvement possible in the 8_8 case in how the 9 XORs at the end are done. ...
          >Yeah, we should try.

          c6166_20090810.patch: tried various xor schemes for 8_8 and 16_16.

          Seems Crc32_8_8d is the best choice. I will generate a patch to replace current PureJavaCrc32 unless someone would like to run some benchmarks.

          • Linux
            java.version = 1.6.0_15
            java.runtime.name = Java(TM) SE Runtime Environment
            java.runtime.version = 1.6.0_15-b03
            java.vm.version = 14.1-b02
            java.vm.vendor = Sun Microsystems Inc.
            java.vm.name = Java HotSpot(TM) 64-Bit Server VM
            java.vm.specification.version = 1.0
            java.specification.version = 1.6
            os.arch = amd64
            os.name = Linux
            os.version = 2.6.9-55.ELsmp

          Performance Table (The unit is MB/sec)

          Num Bytes CRC32 PureJavaCrc32 Crc32_8_8 Crc32_8_8b Crc32_8_8c Crc32_8_8d Crc32_16_16 Crc32_16_16b Crc32_16_16c Crc32_16_16d
          8 49.617 186.277 245.369 258.905 229.094 246.095 217.684 209.102 225.104 216.537
          16 83.794 260.552 254.068 253.335 240.414 269.863 268.293 280.991 248.725 272.588
          32 129.851 311.725 319.791 317.253 300.914 325.145 294.224 306.928 294.640 294.838
          64 175.566 348.701 357.292 359.353 345.997 370.976 325.410 349.629 343.592 346.502
          128 217.394 369.562 386.217 384.014 371.929 392.099 339.749 382.753 378.737 381.048
          256 246.060 381.159 403.724 401.239 389.683 403.455 346.794 394.901 397.287 398.093
          512 261.941 385.598 412.253 405.512 395.790 412.212 350.763 409.029 408.213 407.630
          1024 271.043 390.318 408.391 408.592 398.981 413.389 352.213 414.423 414.213 413.736
          2048 275.942 391.870 412.933 411.371 401.784 417.020 351.803 418.063 416.982 415.269
          4096 280.018 393.816 412.645 411.781 402.632 418.353 353.348 420.783 416.263 416.360
          8192 281.432 394.699 414.998 410.618 401.145 417.090 351.428 420.008 415.680 415.281
          16384 279.658 391.788 413.657 411.491 402.747 418.249 349.888 419.431 414.399 414.312
          32768 279.788 391.665 408.767 410.097 403.396 417.542 349.473 418.467 409.388 412.525
          65536 280.168 391.557 411.674 410.447 404.971 419.271 350.824 419.866 412.099 413.847
          131072 281.196 393.222 411.687 411.539 404.931 418.101 350.692 418.924 412.873 413.763
          262144 281.874 392.158 411.668 411.660 405.089 419.301 350.236 418.776 412.855 413.918
          524288 281.905 392.967 411.713 410.524 404.851 418.481 350.890 419.835 412.881 412.677
          1048576 281.864 393.155 411.589 410.697 405.352 417.905 350.737 419.861 412.734 413.808
          2097152 281.269 392.916 410.495 411.322 405.298 419.084 349.765 419.540 412.517 413.538
          4194304 280.904 388.513 408.313 410.383 404.432 418.168 349.982 418.628 411.404 412.493
          8388608 279.946 389.306 407.485 407.420 401.088 415.597 347.824 415.479 408.459 409.290
          16777216 279.517 388.903 407.068 406.760 400.404 415.171 347.468 415.047 408.089 408.576
          • Windows
            java.version = 1.6.0_14
            java.runtime.name = Java(TM) SE Runtime Environment
            java.runtime.version = 1.6.0_14-b08
            java.vm.version = 14.0-b16
            java.vm.vendor = Sun Microsystems Inc.
            java.vm.name = Java HotSpot(TM) Server VM
            java.vm.specification.version = 1.0
            java.specification.version = 1.6
            os.arch = x86
            os.name = Windows XP
            os.version = 5.1

          Performance Table (The unit is MB/sec)

          Num Bytes CRC32 PureJavaCrc32 Crc32_8_8 Crc32_8_8b Crc32_8_8c Crc32_8_8d Crc32_16_16 Crc32_16_16b Crc32_16_16c Crc32_16_16d
          8 30.582 165.684 222.712 239.031 192.301 192.855 121.520 118.156 121.212 119.522
          16 54.720 215.567 237.607 232.193 230.172 215.288 269.855 279.729 254.891 251.049
          32 93.370 250.110 290.161 300.234 272.414 266.170 288.042 293.170 273.936 278.459
          64 142.510 271.193 317.292 331.031 293.492 298.169 314.142 310.036 309.833 315.505
          128 193.336 285.876 340.127 355.663 302.081 319.978 330.662 318.949 326.466 333.227
          256 237.137 292.450 349.322 366.198 308.711 328.690 335.694 323.855 328.766 342.473
          512 266.200 293.008 352.708 370.587 307.753 330.876 335.443 320.305 338.866 342.883
          1024 275.926 292.009 347.310 367.795 305.459 332.800 328.240 291.745 270.544 338.249
          2048 285.404 282.665 338.805 356.377 300.794 333.204 334.669 317.972 327.810 347.768
          4096 295.626 296.351 353.792 374.937 310.589 337.225 338.851 323.519 335.347 349.135
          8192 299.106 297.335 352.159 371.486 310.073 334.584 336.462 320.744 340.072 343.119
          16384 298.202 295.121 353.141 371.399 311.208 347.209 340.819 321.550 341.870 349.779
          32768 301.017 294.772 355.933 374.572 311.367 354.163 335.798 320.324 344.780 351.859
          65536 305.349 297.090 355.641 377.599 310.232 354.429 337.837 324.306 345.244 350.304
          131072 305.814 297.094 356.518 380.588 310.183 354.515 339.324 319.657 344.429 347.836
          262144 303.674 294.812 352.616 377.256 306.132 349.959 330.275 319.694 339.153 347.561
          524288 306.440 296.660 352.876 371.992 308.378 353.454 339.735 322.835 344.267 350.561
          1048576 303.807 297.499 351.199 375.630 307.785 350.496 338.899 321.673 342.487 350.763
          2097152 302.701 295.785 352.618 376.966 306.529 351.045 335.405 318.857 340.192 349.454
          4194304 300.318 293.945 345.822 373.705 306.829 348.913 334.297 318.580 339.454 346.533
          8388608 299.847 293.259 348.667 373.015 305.414 338.622 333.332 317.337 339.085 346.144
          16777216 298.157 293.100 349.116 372.502 306.273 348.192 334.259 316.192 336.934 347.545
          Show
          Tsz Wo Nicholas Sze added a comment - >> I think there may be some improvement possible in the 8_8 case in how the 9 XORs at the end are done. ... >Yeah, we should try. c6166_20090810.patch: tried various xor schemes for 8_8 and 16_16. Seems Crc32_8_8d is the best choice. I will generate a patch to replace current PureJavaCrc32 unless someone would like to run some benchmarks. Linux java.version = 1.6.0_15 java.runtime.name = Java(TM) SE Runtime Environment java.runtime.version = 1.6.0_15-b03 java.vm.version = 14.1-b02 java.vm.vendor = Sun Microsystems Inc. java.vm.name = Java HotSpot(TM) 64-Bit Server VM java.vm.specification.version = 1.0 java.specification.version = 1.6 os.arch = amd64 os.name = Linux os.version = 2.6.9-55.ELsmp Performance Table (The unit is MB/sec) Num Bytes CRC32 PureJavaCrc32 Crc32_8_8 Crc32_8_8b Crc32_8_8c Crc32_8_8d Crc32_16_16 Crc32_16_16b Crc32_16_16c Crc32_16_16d 8 49.617 186.277 245.369 258.905 229.094 246.095 217.684 209.102 225.104 216.537 16 83.794 260.552 254.068 253.335 240.414 269.863 268.293 280.991 248.725 272.588 32 129.851 311.725 319.791 317.253 300.914 325.145 294.224 306.928 294.640 294.838 64 175.566 348.701 357.292 359.353 345.997 370.976 325.410 349.629 343.592 346.502 128 217.394 369.562 386.217 384.014 371.929 392.099 339.749 382.753 378.737 381.048 256 246.060 381.159 403.724 401.239 389.683 403.455 346.794 394.901 397.287 398.093 512 261.941 385.598 412.253 405.512 395.790 412.212 350.763 409.029 408.213 407.630 1024 271.043 390.318 408.391 408.592 398.981 413.389 352.213 414.423 414.213 413.736 2048 275.942 391.870 412.933 411.371 401.784 417.020 351.803 418.063 416.982 415.269 4096 280.018 393.816 412.645 411.781 402.632 418.353 353.348 420.783 416.263 416.360 8192 281.432 394.699 414.998 410.618 401.145 417.090 351.428 420.008 415.680 415.281 16384 279.658 391.788 413.657 411.491 402.747 418.249 349.888 419.431 414.399 414.312 32768 279.788 391.665 408.767 410.097 403.396 417.542 349.473 418.467 409.388 412.525 65536 280.168 391.557 411.674 410.447 404.971 419.271 350.824 419.866 412.099 413.847 131072 281.196 393.222 411.687 411.539 404.931 418.101 350.692 418.924 412.873 413.763 262144 281.874 392.158 411.668 411.660 405.089 419.301 350.236 418.776 412.855 413.918 524288 281.905 392.967 411.713 410.524 404.851 418.481 350.890 419.835 412.881 412.677 1048576 281.864 393.155 411.589 410.697 405.352 417.905 350.737 419.861 412.734 413.808 2097152 281.269 392.916 410.495 411.322 405.298 419.084 349.765 419.540 412.517 413.538 4194304 280.904 388.513 408.313 410.383 404.432 418.168 349.982 418.628 411.404 412.493 8388608 279.946 389.306 407.485 407.420 401.088 415.597 347.824 415.479 408.459 409.290 16777216 279.517 388.903 407.068 406.760 400.404 415.171 347.468 415.047 408.089 408.576 Windows java.version = 1.6.0_14 java.runtime.name = Java(TM) SE Runtime Environment java.runtime.version = 1.6.0_14-b08 java.vm.version = 14.0-b16 java.vm.vendor = Sun Microsystems Inc. java.vm.name = Java HotSpot(TM) Server VM java.vm.specification.version = 1.0 java.specification.version = 1.6 os.arch = x86 os.name = Windows XP os.version = 5.1 Performance Table (The unit is MB/sec) Num Bytes CRC32 PureJavaCrc32 Crc32_8_8 Crc32_8_8b Crc32_8_8c Crc32_8_8d Crc32_16_16 Crc32_16_16b Crc32_16_16c Crc32_16_16d 8 30.582 165.684 222.712 239.031 192.301 192.855 121.520 118.156 121.212 119.522 16 54.720 215.567 237.607 232.193 230.172 215.288 269.855 279.729 254.891 251.049 32 93.370 250.110 290.161 300.234 272.414 266.170 288.042 293.170 273.936 278.459 64 142.510 271.193 317.292 331.031 293.492 298.169 314.142 310.036 309.833 315.505 128 193.336 285.876 340.127 355.663 302.081 319.978 330.662 318.949 326.466 333.227 256 237.137 292.450 349.322 366.198 308.711 328.690 335.694 323.855 328.766 342.473 512 266.200 293.008 352.708 370.587 307.753 330.876 335.443 320.305 338.866 342.883 1024 275.926 292.009 347.310 367.795 305.459 332.800 328.240 291.745 270.544 338.249 2048 285.404 282.665 338.805 356.377 300.794 333.204 334.669 317.972 327.810 347.768 4096 295.626 296.351 353.792 374.937 310.589 337.225 338.851 323.519 335.347 349.135 8192 299.106 297.335 352.159 371.486 310.073 334.584 336.462 320.744 340.072 343.119 16384 298.202 295.121 353.141 371.399 311.208 347.209 340.819 321.550 341.870 349.779 32768 301.017 294.772 355.933 374.572 311.367 354.163 335.798 320.324 344.780 351.859 65536 305.349 297.090 355.641 377.599 310.232 354.429 337.837 324.306 345.244 350.304 131072 305.814 297.094 356.518 380.588 310.183 354.515 339.324 319.657 344.429 347.836 262144 303.674 294.812 352.616 377.256 306.132 349.959 330.275 319.694 339.153 347.561 524288 306.440 296.660 352.876 371.992 308.378 353.454 339.735 322.835 344.267 350.561 1048576 303.807 297.499 351.199 375.630 307.785 350.496 338.899 321.673 342.487 350.763 2097152 302.701 295.785 352.618 376.966 306.529 351.045 335.405 318.857 340.192 349.454 4194304 300.318 293.945 345.822 373.705 306.829 348.913 334.297 318.580 339.454 346.533 8388608 299.847 293.259 348.667 373.015 305.414 338.622 333.332 317.337 339.085 346.144 16777216 298.157 293.100 349.116 372.502 306.273 348.192 334.259 316.192 336.934 347.545
          Tsz Wo Nicholas Sze made changes -
          Attachment c6166_20090810.patch [ 12416134 ]
          Hide
          Scott Carey added a comment -

          We probably want to have Todd's concurrency test from HADOOP-5318 run to make sure the larger lookup table doesn't slow things down under concurrency.

          We might also want to try the old four at a time code for 4 <= len < 8.

          We should also confirm the results on one of the other systems we tested in the past. I won't be able to do that for a couple days, but it should be easy then.
          How many other variants did you try? Intel's C code does some strange things to group CRC's by 3. Maybe something like the below is worth testing if it hasn't already been done haven't already:

          +public class Crc32_8_8e extends Crc32Base {
            public void update(byte[] b, int off, int len) {
              while(len > 7) {
                int c0 = b[off++] ^ crc;
                int c1 = b[off++] ^ (crc >>>= 8);
                int c2 = b[off++] ^ (crc >>>= 8);
                int c3 = b[off++] ^ (crc >>>= 8);
               
                crc = T8_7[c0 & 0xff] ^ T8_6[c1 & 0xff] ^ T8_5[c2 & 0xff] ^ T8_4[c3 & 0xff]; // three xors
          
                crc ^= T8_3[b[off++] & 0xff] ^ T8_2[b[off++] & 0xff]; // two xors
          
                crc ^= T8_1[b[off++] & 0xff] ^ T8_0[b[off++] & 0xff]; // two xors
          
                len -= 8;
              }
              while(len > 0) {
                crc = (crc >>> 8) ^ T8_0[(crc ^ b[off++]) & 0xff];
                len--;
              }
            }
          }
          Show
          Scott Carey added a comment - We probably want to have Todd's concurrency test from HADOOP-5318 run to make sure the larger lookup table doesn't slow things down under concurrency. We might also want to try the old four at a time code for 4 <= len < 8. We should also confirm the results on one of the other systems we tested in the past. I won't be able to do that for a couple days, but it should be easy then. How many other variants did you try? Intel's C code does some strange things to group CRC's by 3. Maybe something like the below is worth testing if it hasn't already been done haven't already: + public class Crc32_8_8e extends Crc32Base { public void update( byte [] b, int off, int len) { while (len > 7) { int c0 = b[off++] ^ crc; int c1 = b[off++] ^ (crc >>>= 8); int c2 = b[off++] ^ (crc >>>= 8); int c3 = b[off++] ^ (crc >>>= 8); crc = T8_7[c0 & 0xff] ^ T8_6[c1 & 0xff] ^ T8_5[c2 & 0xff] ^ T8_4[c3 & 0xff]; // three xors crc ^= T8_3[b[off++] & 0xff] ^ T8_2[b[off++] & 0xff]; // two xors crc ^= T8_1[b[off++] & 0xff] ^ T8_0[b[off++] & 0xff]; // two xors len -= 8; } while (len > 0) { crc = (crc >>> 8) ^ T8_0[(crc ^ b[off++]) & 0xff]; len--; } } }
          Hide
          Tsz Wo Nicholas Sze added a comment -

          c6166_20090811.patch: added Crc32_8_8e and deleted some old classes.

          > We probably want to have Todd's concurrency test from HADOOP-5318 run to make sure the larger lookup table doesn't slow things down under concurrency.
          Todd, could you help running the test?

          > We might also want to try the old four at a time code for 4 <= len < 8.
          What do you mean exactly?

          > We should also confirm the results on one of the other systems we tested in the past. I won't be able to do that for a couple days, but it should be easy then.
          Hope you could find some time to do it soon.

          > How many other variants did you try? Intel's C code does some strange things to group CRC's by 3. ...
          Many others. You know, there are many combinations. I also tried different table sizes as shown before.

          Included Crc32_8_8e below. Crc32_8_8d still seems the best choice.

          • java.version = 1.6.0_15
            java.runtime.name = Java(TM) SE Runtime Environment
            java.runtime.version = 1.6.0_15-b03
            java.vm.version = 14.1-b02
            java.vm.vendor = Sun Microsystems Inc.
            java.vm.name = Java HotSpot(TM) 64-Bit Server VM
            java.vm.specification.version = 1.0
            java.specification.version = 1.6
            os.arch = amd64
            os.name = Linux
            os.version = 2.6.9-55.ELsmp

          Performance Table (The unit is MB/sec)

          Num Bytes CRC32 PureJavaCrc32 Crc32_8_8 Crc32_8_8b Crc32_8_8c Crc32_8_8d Crc32_8_8e Crc32_16_16 Crc32_16_16b Crc32_16_16c Crc32_16_16d
          1 7.554 71.591 79.205 103.936 80.965 79.185 80.812 84.215 80.733 80.732 84.860
          2 14.768 104.753 110.837 110.771 110.798 115.222 110.820 119.278 110.844 110.836 119.207
          4 27.150 177.142 119.587 114.780 120.051 128.626 117.005 125.412 115.439 120.409 125.486
          8 49.921 193.631 239.096 238.506 228.868 248.667 244.446 217.270 209.170 209.149 215.505
          16 83.886 259.453 261.683 254.016 240.835 258.250 247.261 267.441 279.439 276.864 271.520
          32 128.960 312.202 323.233 319.924 301.175 322.962 320.275 295.428 302.402 294.960 295.872
          64 177.529 349.882 362.037 364.918 347.768 365.013 358.567 326.303 352.837 345.517 347.880
          128 217.577 370.920 387.254 386.534 372.472 391.180 386.159 339.850 383.044 373.716 381.058
          256 245.685 382.041 403.693 401.793 390.270 406.037 402.777 347.134 400.956 388.941 399.331
          512 263.143 385.666 411.380 407.443 396.506 413.556 411.803 350.844 411.293 397.335 409.011
          1024 271.941 390.055 406.949 406.830 399.163 415.963 424.095 352.875 417.516 401.168 414.229
          2048 276.881 392.684 412.759 411.123 402.143 418.730 424.565 353.111 418.855 402.870 415.696
          4096 279.541 393.738 418.844 413.271 403.645 419.106 424.975 353.582 421.574 403.834 417.058
          8192 280.308 392.859 417.096 412.115 403.015 417.989 422.254 352.011 419.664 402.378 415.385
          16384 280.420 393.006 415.629 409.566 403.498 418.107 420.676 350.423 418.062 401.090 412.468
          32768 280.488 392.172 410.737 411.129 403.873 417.434 413.839 349.526 418.633 401.736 412.341
          65536 281.809 393.565 411.490 412.331 405.347 419.190 414.866 350.781 418.473 401.637 412.297
          131072 281.994 393.634 411.533 412.281 405.387 419.620 414.802 350.869 418.350 401.634 412.346
          262144 282.106 393.732 411.339 411.562 404.172 416.299 413.072 349.416 416.190 400.395 410.680
          524288 281.194 392.195 409.839 410.759 403.954 417.950 413.178 350.104 418.596 401.175 412.137
          1048576 282.160 393.700 411.316 412.339 405.165 419.694 414.608 351.029 416.836 401.927 412.122
          2097152 281.584 393.264 410.322 405.048 402.028 414.893 406.574 328.270 410.558 393.239 403.796
          4194304 274.062 385.751 409.093 408.604 402.332 419.176 412.265 350.039 415.385 400.332 410.738
          8388608 279.870 385.793 406.788 407.685 400.830 415.537 409.612 344.396 412.684 397.534 406.201
          16777216 279.585 389.025 405.902 407.222 400.414 414.680 409.184 346.930 413.039 396.057 406.953
          • java.version = 1.6.0_14
            java.runtime.name = Java(TM) SE Runtime Environment
            java.runtime.version = 1.6.0_14-b08
            java.vm.version = 14.0-b16
            java.vm.vendor = Sun Microsystems Inc.
            java.vm.name = Java HotSpot(TM) Server VM
            java.vm.specification.version = 1.0
            java.specification.version = 1.6
            os.arch = x86
            os.name = Windows XP
            os.version = 5.1

          Performance Table (The unit is MB/sec)

          Num Bytes CRC32 PureJavaCrc32 Crc32_8_8 Crc32_8_8b Crc32_8_8c Crc32_8_8d Crc32_8_8e Crc32_16_16 Crc32_16_16b Crc32_16_16c Crc32_16_16d
          1 4.669 62.483 67.995 64.360 64.700 64.430 51.386 61.391 62.175 68.944 62.271
          2 9.959 79.342 88.566 79.881 91.326 88.695 81.976 89.846 83.984 91.166 84.918
          4 19.448 136.996 119.941 129.939 120.992 119.898 132.349 118.682 116.076 122.505 114.090
          8 36.518 168.433 240.284 219.896 213.865 212.471 209.605 115.325 135.040 145.009 139.316
          16 66.157 233.146 272.227 258.220 250.948 245.771 253.216 313.174 321.156 289.904 285.301
          32 111.982 283.081 327.881 326.083 311.988 300.416 299.799 331.685 338.723 316.987 319.500
          64 169.448 315.802 364.829 368.059 348.219 340.445 323.045 360.273 357.086 352.497 359.832
          128 227.468 335.743 384.770 394.813 373.775 368.234 338.247 379.686 363.607 372.900 382.183
          256 274.901 348.319 395.674 410.081 386.566 388.908 345.215 385.142 368.920 383.446 392.874
          512 307.317 353.626 401.379 422.212 392.543 399.756 348.747 391.548 372.486 389.674 400.035
          1024 325.610 358.707 408.000 424.190 390.858 386.952 349.863 392.606 373.485 393.192 403.376
          2048 337.261 361.642 410.563 429.318 395.794 388.887 351.070 392.865 373.458 394.564 404.084
          4096 346.026 363.077 408.908 432.169 402.087 394.633 351.254 393.764 372.779 395.751 405.531
          8192 348.807 363.550 407.264 431.891 409.774 400.138 350.242 393.582 373.939 394.498 405.981
          16384 350.030 361.773 407.574 433.393 413.786 399.135 352.191 391.875 371.887 396.563 404.991
          32768 350.552 360.131 408.495 428.463 413.372 406.904 352.049 390.723 371.160 393.841 403.526
          65536 350.624 359.343 407.846 427.477 411.844 407.847 351.866 388.578 371.861 394.314 403.752
          131072 351.383 358.687 407.854 426.513 411.453 407.655 352.480 388.961 372.343 394.181 401.394
          262144 351.508 359.392 406.755 427.817 413.534 407.467 352.457 389.811 370.843 396.034 404.051
          524288 350.264 358.659 407.698 427.750 411.986 408.091 351.678 390.126 371.458 395.884 404.374
          1048576 351.092 358.452 406.796 426.634 413.797 408.162 351.430 389.716 372.157 394.826 403.300
          2097152 349.731 357.048 407.095 426.922 411.427 405.970 351.830 388.308 371.514 394.656 401.170
          4194304 345.948 354.513 401.117 423.269 408.442 401.112 348.512 384.834 365.403 390.539 399.911
          8388608 343.162 350.602 400.009 415.757 403.662 399.569 344.909 382.301 365.704 386.523 397.156
          16777216 343.989 351.502 400.036 417.893 403.717 400.511 345.505 381.932 363.491 385.044 393.422
          Show
          Tsz Wo Nicholas Sze added a comment - c6166_20090811.patch: added Crc32_8_8e and deleted some old classes. > We probably want to have Todd's concurrency test from HADOOP-5318 run to make sure the larger lookup table doesn't slow things down under concurrency. Todd, could you help running the test? > We might also want to try the old four at a time code for 4 <= len < 8. What do you mean exactly? > We should also confirm the results on one of the other systems we tested in the past. I won't be able to do that for a couple days, but it should be easy then. Hope you could find some time to do it soon. > How many other variants did you try? Intel's C code does some strange things to group CRC's by 3. ... Many others. You know, there are many combinations. I also tried different table sizes as shown before. Included Crc32_8_8e below. Crc32_8_8d still seems the best choice. java.version = 1.6.0_15 java.runtime.name = Java(TM) SE Runtime Environment java.runtime.version = 1.6.0_15-b03 java.vm.version = 14.1-b02 java.vm.vendor = Sun Microsystems Inc. java.vm.name = Java HotSpot(TM) 64-Bit Server VM java.vm.specification.version = 1.0 java.specification.version = 1.6 os.arch = amd64 os.name = Linux os.version = 2.6.9-55.ELsmp Performance Table (The unit is MB/sec) Num Bytes CRC32 PureJavaCrc32 Crc32_8_8 Crc32_8_8b Crc32_8_8c Crc32_8_8d Crc32_8_8e Crc32_16_16 Crc32_16_16b Crc32_16_16c Crc32_16_16d 1 7.554 71.591 79.205 103.936 80.965 79.185 80.812 84.215 80.733 80.732 84.860 2 14.768 104.753 110.837 110.771 110.798 115.222 110.820 119.278 110.844 110.836 119.207 4 27.150 177.142 119.587 114.780 120.051 128.626 117.005 125.412 115.439 120.409 125.486 8 49.921 193.631 239.096 238.506 228.868 248.667 244.446 217.270 209.170 209.149 215.505 16 83.886 259.453 261.683 254.016 240.835 258.250 247.261 267.441 279.439 276.864 271.520 32 128.960 312.202 323.233 319.924 301.175 322.962 320.275 295.428 302.402 294.960 295.872 64 177.529 349.882 362.037 364.918 347.768 365.013 358.567 326.303 352.837 345.517 347.880 128 217.577 370.920 387.254 386.534 372.472 391.180 386.159 339.850 383.044 373.716 381.058 256 245.685 382.041 403.693 401.793 390.270 406.037 402.777 347.134 400.956 388.941 399.331 512 263.143 385.666 411.380 407.443 396.506 413.556 411.803 350.844 411.293 397.335 409.011 1024 271.941 390.055 406.949 406.830 399.163 415.963 424.095 352.875 417.516 401.168 414.229 2048 276.881 392.684 412.759 411.123 402.143 418.730 424.565 353.111 418.855 402.870 415.696 4096 279.541 393.738 418.844 413.271 403.645 419.106 424.975 353.582 421.574 403.834 417.058 8192 280.308 392.859 417.096 412.115 403.015 417.989 422.254 352.011 419.664 402.378 415.385 16384 280.420 393.006 415.629 409.566 403.498 418.107 420.676 350.423 418.062 401.090 412.468 32768 280.488 392.172 410.737 411.129 403.873 417.434 413.839 349.526 418.633 401.736 412.341 65536 281.809 393.565 411.490 412.331 405.347 419.190 414.866 350.781 418.473 401.637 412.297 131072 281.994 393.634 411.533 412.281 405.387 419.620 414.802 350.869 418.350 401.634 412.346 262144 282.106 393.732 411.339 411.562 404.172 416.299 413.072 349.416 416.190 400.395 410.680 524288 281.194 392.195 409.839 410.759 403.954 417.950 413.178 350.104 418.596 401.175 412.137 1048576 282.160 393.700 411.316 412.339 405.165 419.694 414.608 351.029 416.836 401.927 412.122 2097152 281.584 393.264 410.322 405.048 402.028 414.893 406.574 328.270 410.558 393.239 403.796 4194304 274.062 385.751 409.093 408.604 402.332 419.176 412.265 350.039 415.385 400.332 410.738 8388608 279.870 385.793 406.788 407.685 400.830 415.537 409.612 344.396 412.684 397.534 406.201 16777216 279.585 389.025 405.902 407.222 400.414 414.680 409.184 346.930 413.039 396.057 406.953 java.version = 1.6.0_14 java.runtime.name = Java(TM) SE Runtime Environment java.runtime.version = 1.6.0_14-b08 java.vm.version = 14.0-b16 java.vm.vendor = Sun Microsystems Inc. java.vm.name = Java HotSpot(TM) Server VM java.vm.specification.version = 1.0 java.specification.version = 1.6 os.arch = x86 os.name = Windows XP os.version = 5.1 Performance Table (The unit is MB/sec) Num Bytes CRC32 PureJavaCrc32 Crc32_8_8 Crc32_8_8b Crc32_8_8c Crc32_8_8d Crc32_8_8e Crc32_16_16 Crc32_16_16b Crc32_16_16c Crc32_16_16d 1 4.669 62.483 67.995 64.360 64.700 64.430 51.386 61.391 62.175 68.944 62.271 2 9.959 79.342 88.566 79.881 91.326 88.695 81.976 89.846 83.984 91.166 84.918 4 19.448 136.996 119.941 129.939 120.992 119.898 132.349 118.682 116.076 122.505 114.090 8 36.518 168.433 240.284 219.896 213.865 212.471 209.605 115.325 135.040 145.009 139.316 16 66.157 233.146 272.227 258.220 250.948 245.771 253.216 313.174 321.156 289.904 285.301 32 111.982 283.081 327.881 326.083 311.988 300.416 299.799 331.685 338.723 316.987 319.500 64 169.448 315.802 364.829 368.059 348.219 340.445 323.045 360.273 357.086 352.497 359.832 128 227.468 335.743 384.770 394.813 373.775 368.234 338.247 379.686 363.607 372.900 382.183 256 274.901 348.319 395.674 410.081 386.566 388.908 345.215 385.142 368.920 383.446 392.874 512 307.317 353.626 401.379 422.212 392.543 399.756 348.747 391.548 372.486 389.674 400.035 1024 325.610 358.707 408.000 424.190 390.858 386.952 349.863 392.606 373.485 393.192 403.376 2048 337.261 361.642 410.563 429.318 395.794 388.887 351.070 392.865 373.458 394.564 404.084 4096 346.026 363.077 408.908 432.169 402.087 394.633 351.254 393.764 372.779 395.751 405.531 8192 348.807 363.550 407.264 431.891 409.774 400.138 350.242 393.582 373.939 394.498 405.981 16384 350.030 361.773 407.574 433.393 413.786 399.135 352.191 391.875 371.887 396.563 404.991 32768 350.552 360.131 408.495 428.463 413.372 406.904 352.049 390.723 371.160 393.841 403.526 65536 350.624 359.343 407.846 427.477 411.844 407.847 351.866 388.578 371.861 394.314 403.752 131072 351.383 358.687 407.854 426.513 411.453 407.655 352.480 388.961 372.343 394.181 401.394 262144 351.508 359.392 406.755 427.817 413.534 407.467 352.457 389.811 370.843 396.034 404.051 524288 350.264 358.659 407.698 427.750 411.986 408.091 351.678 390.126 371.458 395.884 404.374 1048576 351.092 358.452 406.796 426.634 413.797 408.162 351.430 389.716 372.157 394.826 403.300 2097152 349.731 357.048 407.095 426.922 411.427 405.970 351.830 388.308 371.514 394.656 401.170 4194304 345.948 354.513 401.117 423.269 408.442 401.112 348.512 384.834 365.403 390.539 399.911 8388608 343.162 350.602 400.009 415.757 403.662 399.569 344.909 382.301 365.704 386.523 397.156 16777216 343.989 351.502 400.036 417.893 403.717 400.511 345.505 381.932 363.491 385.044 393.422
          Tsz Wo Nicholas Sze made changes -
          Attachment c6166_20090811.patch [ 12416288 ]
          Hide
          Todd Lipcon added a comment -

          Todd, could you help running the test?

          Yep, we're doing sprint planning tomorrow so I'll make sure to block off a couple hours to run benchmarks for this.

          Show
          Todd Lipcon added a comment - Todd, could you help running the test? Yep, we're doing sprint planning tomorrow so I'll make sure to block off a couple hours to run benchmarks for this.
          Hide
          Todd Lipcon added a comment -

          Attaching PDF of the concurrency benchmark from HADOOP-5318 run on all of these implementations. The first page also includes java.util.zip.CRC32 to verify that the benchmark script really is testing something and that we've fixed the problem originally reported in that ticket. The second page of the PDF is the same graph after dropping the super-slow one.

          I've normalized all of the results as percentages of the very fastest. It looks like they're all within 5-10% of each other for this benchmark, and with the error bars it's pretty obvious that there is no clear winner for the 1-byte-write case at any concurrency level. If anything Crc32_6_6 seems to come out a little ahead, but really barely discernible.

          I'd upload the script I used to run the benchmarks, but I had to add a silly little patch to Hadoop to be able to switch between implementations easily, so it's a hassle to run.

          For reference, this system is a Nehalem box running 64-bit JDK 1.6.0u14, 8 cores w/ hyperthreading (16 logical).

          Show
          Todd Lipcon added a comment - Attaching PDF of the concurrency benchmark from HADOOP-5318 run on all of these implementations. The first page also includes java.util.zip.CRC32 to verify that the benchmark script really is testing something and that we've fixed the problem originally reported in that ticket. The second page of the PDF is the same graph after dropping the super-slow one. I've normalized all of the results as percentages of the very fastest. It looks like they're all within 5-10% of each other for this benchmark, and with the error bars it's pretty obvious that there is no clear winner for the 1-byte-write case at any concurrency level. If anything Crc32_6_6 seems to come out a little ahead, but really barely discernible. I'd upload the script I used to run the benchmarks, but I had to add a silly little patch to Hadoop to be able to switch between implementations easily, so it's a hassle to run. For reference, this system is a Nehalem box running 64-bit JDK 1.6.0u14, 8 cores w/ hyperthreading (16 logical).
          Todd Lipcon made changes -
          Attachment Rplots.pdf [ 12416523 ]
          Attachment graph.r [ 12416524 ]
          Hide
          Todd Lipcon added a comment -

          Oops, excuse me - off-by-one error in my R script for matching up colors to datasets. Same results, except the one that appears to do the best is 8_8b

          Show
          Todd Lipcon added a comment - Oops, excuse me - off-by-one error in my R script for matching up colors to datasets. Same results, except the one that appears to do the best is 8_8b
          Todd Lipcon made changes -
          Attachment graph.r [ 12416525 ]
          Attachment Rplots.pdf [ 12416526 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Thanks, Todd.

          8_8b is also best in my benchmark ran on Windows.

          Question: Crc32Base and Crc32Table are not crc implementations. What do they mean in the graph?

          Show
          Tsz Wo Nicholas Sze added a comment - Thanks, Todd. 8_8b is also best in my benchmark ran on Windows. Question: Crc32Base and Crc32Table are not crc implementations. What do they mean in the graph?
          Hide
          Todd Lipcon added a comment -

          Oops - I had some old data files lying around in my "out" directory from a previous run where I had a mistake in my code. I've removed them and regenerated the graphs here.

          I'm currently running CRC32 benchmarks without concurrency and will upload those when they finish.

          Show
          Todd Lipcon added a comment - Oops - I had some old data files lying around in my "out" directory from a previous run where I had a mistake in my code. I've removed them and regenerated the graphs here. I'm currently running CRC32 benchmarks without concurrency and will upload those when they finish.
          Todd Lipcon made changes -
          Attachment Rplots.pdf [ 12416609 ]
          Hide
          Todd Lipcon added a comment -

          Looks like the benchmark has run long enough to get good data. Here are the benchmarks from TestPureJavaCrc32 run on three different test systems. nehalem32 is the same nehalem box (3MB L2 cache) running a 32-bit JVM. nehalem64 is that box with a 64-bit JVM. "laptop" is my MacBook Pro (Core 2 duo) running a 64-bit JVM.

          Each PDF has several pages:

          • The first graph shows performance over the whole byte range tested. You'll definitely have to zoom in to be able to see anything here, and even then it's not that useful.
          • The remaining graphs show the different algorithms' performance on different sizes (same as the tables people have been pasting into JIRA)

          I ran the whole benchmark suite 50+ times to generate the error bars. Hopefully they'll serve as a good visual indicator for where the differences are actually statistically significant.

          In summary, here's how I interpret the data:

          • For the 4-byte case, PureJavaCrc32 wins out on my laptop and the 32-bit JVM by a strong margin. On the 64-bit JVM it's within 5-10% of the rest (very little difference)
          • The 8-byte case is interesting - all of the 16_16* CRCs perform worse then the _8_8 CRCs. On the 32-bit JDK it's especially obvious (nearly a factor of two)
          • The 512-byte case (probably most common for DFS) - everyone is pretty much neck and neck. The 8_8d implementation wins significantly on nehalem64, and the 8_8b wins significantly on nehalem32. On my laptop they're all within the error bars except for 16_16 which is significantly worse
          • The 16MB case is the same as the 512-byte case, just more pronounced. 8_8d wins on 64-bit, 8_8b on 32-bit, both by about 10%.

          So, I think the next step here is to profile a couple of MR applications to see what sizes are most common.

          My personal opinion is that we should target the 64-bit Nehalem architecture and the 128-byte size range. This would point to the 8_8d implementation as the winner.

          Show
          Todd Lipcon added a comment - Looks like the benchmark has run long enough to get good data. Here are the benchmarks from TestPureJavaCrc32 run on three different test systems. nehalem32 is the same nehalem box (3MB L2 cache) running a 32-bit JVM. nehalem64 is that box with a 64-bit JVM. "laptop" is my MacBook Pro (Core 2 duo) running a 64-bit JVM. Each PDF has several pages: The first graph shows performance over the whole byte range tested. You'll definitely have to zoom in to be able to see anything here, and even then it's not that useful. The remaining graphs show the different algorithms' performance on different sizes (same as the tables people have been pasting into JIRA) I ran the whole benchmark suite 50+ times to generate the error bars. Hopefully they'll serve as a good visual indicator for where the differences are actually statistically significant. In summary, here's how I interpret the data: For the 4-byte case, PureJavaCrc32 wins out on my laptop and the 32-bit JVM by a strong margin. On the 64-bit JVM it's within 5-10% of the rest (very little difference) The 8-byte case is interesting - all of the 16_16* CRCs perform worse then the _8_8 CRCs. On the 32-bit JDK it's especially obvious (nearly a factor of two) The 512-byte case (probably most common for DFS) - everyone is pretty much neck and neck. The 8_8d implementation wins significantly on nehalem64, and the 8_8b wins significantly on nehalem32. On my laptop they're all within the error bars except for 16_16 which is significantly worse The 16MB case is the same as the 512-byte case, just more pronounced. 8_8d wins on 64-bit, 8_8b on 32-bit, both by about 10%. So, I think the next step here is to profile a couple of MR applications to see what sizes are most common. My personal opinion is that we should target the 64-bit Nehalem architecture and the 128-byte size range. This would point to the 8_8d implementation as the winner.
          Todd Lipcon made changes -
          Attachment Rplots-laptop.pdf [ 12416612 ]
          Attachment Rplots-nehalem64.pdf [ 12416613 ]
          Attachment Rplots-nehalem32.pdf [ 12416611 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > ... This would point to the 8_8d implementation as the winner.
          I agree since Crc32_8_8d is the best on 64-bit platforms.

          Show
          Tsz Wo Nicholas Sze added a comment - > ... This would point to the 8_8d implementation as the winner. I agree since Crc32_8_8d is the best on 64-bit platforms.
          Hide
          Todd Lipcon added a comment -

          I agree since Crc32_8_8d is the best on 64-bit platforms.

          +1, though I just realized I didn't run my 32-bit JVM tests with -server. I should probably run that this afternoon before we make a final decision.

          Show
          Todd Lipcon added a comment - I agree since Crc32_8_8d is the best on 64-bit platforms. +1, though I just realized I didn't run my 32-bit JVM tests with -server. I should probably run that this afternoon before we make a final decision.
          Hide
          Todd Lipcon added a comment -

          Ran the same benchmarks with java -d32 -server and the results are indistinguishable (I guess since the benchmarks run a long time the client VM does just as much JIT as the server VM).

          So, pending anyone coming up with evidence that we should fine tune for the really-small-checksum case on processors like my laptop, 8_8d wins by enough of a margin it seems worth doing.

          Show
          Todd Lipcon added a comment - Ran the same benchmarks with java -d32 -server and the results are indistinguishable (I guess since the benchmarks run a long time the client VM does just as much JIT as the server VM). So, pending anyone coming up with evidence that we should fine tune for the really-small-checksum case on processors like my laptop, 8_8d wins by enough of a margin it seems worth doing.
          Hide
          Scott Carey added a comment -

          Great test results!

          The conclusions I draw from these:

          • All of these perform well under concurrency, regardless of whether its 4K, 8K, or 16K of lookup tables.
          • General performance trends in a more sophisticated test like yours line up with the simpler Perf test we have, so we can be confident in those results if they are consistent enough run to run.

          The 32 bit JVM likes 8_8b, 64 bit 8_8d. I agree that d is the winner here.

          Because the '8' variants shift to one byte at a time if the input is less than 8 bytes, they perform worse than the old PureJavaCrc32 at the 4 byte to 7 byte level. Is this important? It would be useful to know how often the crc code is called on small byte chunks. We can get this to near PureJavaCrc32 speeds for 4 byte sizes if we add a four byte at a time block to 8_8d.

          That is, we can go to something of the form:

          while(len > 7) {
                 < 8 at a time code here >
              }
              while(len > 3) {
                 < 4 at a time code here>
              }
              while(len > 0) {
                 < 1 at a time code here>
              }
          

          Whether that is important depends on the use cases and frequency of requests in the 4-7 and 12-15 byte range. We can even have a block for two-at a time optimization.

          Show
          Scott Carey added a comment - Great test results! The conclusions I draw from these: All of these perform well under concurrency, regardless of whether its 4K, 8K, or 16K of lookup tables. General performance trends in a more sophisticated test like yours line up with the simpler Perf test we have, so we can be confident in those results if they are consistent enough run to run. The 32 bit JVM likes 8_8b, 64 bit 8_8d. I agree that d is the winner here. Because the '8' variants shift to one byte at a time if the input is less than 8 bytes, they perform worse than the old PureJavaCrc32 at the 4 byte to 7 byte level. Is this important? It would be useful to know how often the crc code is called on small byte chunks. We can get this to near PureJavaCrc32 speeds for 4 byte sizes if we add a four byte at a time block to 8_8d. That is, we can go to something of the form: while (len > 7) { < 8 at a time code here > } while (len > 3) { < 4 at a time code here> } while (len > 0) { < 1 at a time code here> } Whether that is important depends on the use cases and frequency of requests in the 4-7 and 12-15 byte range. We can even have a block for two-at a time optimization.
          Hide
          Todd Lipcon added a comment -

          Personally, I'm satisfied at this point But, if others feel like squeezing that last ounce out, I'm happy to rerun the tests when you have some new candidates. Just let me know.

          Regarding the importance of the different sizes, I think the next step is to instrument the CRC32 to log every invocation to a file and run some single-node wordcounts or something. With a histogram of the results it should be easy to see whether the small-buffer case matters at all in real scenarios.

          Show
          Todd Lipcon added a comment - Personally, I'm satisfied at this point But, if others feel like squeezing that last ounce out, I'm happy to rerun the tests when you have some new candidates. Just let me know. Regarding the importance of the different sizes, I think the next step is to instrument the CRC32 to log every invocation to a file and run some single-node wordcounts or something. With a histogram of the results it should be easy to see whether the small-buffer case matters at all in real scenarios.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > Because the '8' variants shift to one byte at a time if the input is less than 8 bytes, they perform worse than the old PureJavaCrc32 at the 4 byte to 7 byte level. Is this important? It would be useful to know how often the crc code is called on small byte chunks. We can get this to near PureJavaCrc32 speeds for 4 byte sizes if we add a four byte at a time block to 8_8d.

          I tried this before but it did not help. BTW, the while-loop in the middle should be an if-statement since we have len <= 7.

          Show
          Tsz Wo Nicholas Sze added a comment - > Because the '8' variants shift to one byte at a time if the input is less than 8 bytes, they perform worse than the old PureJavaCrc32 at the 4 byte to 7 byte level. Is this important? It would be useful to know how often the crc code is called on small byte chunks. We can get this to near PureJavaCrc32 speeds for 4 byte sizes if we add a four byte at a time block to 8_8d. I tried this before but it did not help. BTW, the while-loop in the middle should be an if-statement since we have len <= 7.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > I tried this before but it did not help. ...
          I should be more clear: it did help for the 4-byte case but it hurt the performance in some other cases.

          c6166_20090819.patch: added Crc32_8_8b2 and Crc32_8_8d2. See whether anyone wants to play with them.

          Here is my results.

          • java.version = 1.6.0_15
            java.runtime.name = Java(TM) SE Runtime Environment
            java.runtime.version = 1.6.0_15-b03
            java.vm.version = 14.1-b02
            java.vm.vendor = Sun Microsystems Inc.
            java.vm.name = Java HotSpot(TM) 64-Bit Server VM
            java.vm.specification.version = 1.0
            java.specification.version = 1.6
            os.arch = amd64
            os.name = Linux
            os.version = 2.6.9-55.ELsmp

          Performance Table (The unit is MB/sec)

          Num Bytes CRC32 PureJavaCrc32 Crc32_8_8b Crc32_8_8b2 Crc32_8_8d Crc32_8_8d2
          1 7.236 106.903 58.534 61.106 109.742 67.673
          2 14.605 104.665 104.573 104.952 115.005 102.944
          4 27.566 177.502 223.599 228.648 131.890 218.530
          8 49.708 193.198 196.794 175.006 251.469 204.796
          16 83.804 259.008 234.906 221.473 258.878 236.937
          32 128.918 313.157 299.622 273.306 322.913 300.376
          64 176.797 349.738 339.404 320.543 365.522 348.328
          128 217.087 370.589 362.759 353.613 390.209 384.627
          256 244.869 381.630 376.096 369.157 405.767 401.649
          512 261.282 385.554 382.974 378.841 413.286 411.211
          1024 271.205 390.170 384.064 384.586 415.817 413.505
          2048 276.269 392.448 386.215 387.585 418.559 417.466
          4096 279.280 393.668 387.571 388.814 418.777 419.686
          8192 280.938 394.176 388.399 389.576 420.204 420.571
          16384 281.385 393.164 389.518 388.856 420.236 420.392
          32768 281.442 391.935 389.115 387.306 419.440 417.362
          65536 281.605 391.986 389.124 387.297 419.405 417.123
          131072 281.334 393.221 389.093 384.810 417.836 415.334
          262144 280.363 391.701 386.141 385.937 417.907 415.452
          524288 280.373 391.727 387.646 385.911 417.851 415.510
          1048576 280.295 391.585 387.535 385.745 417.587 414.222
          2097152 280.866 391.217 387.413 385.603 417.605 414.120
          4194304 280.273 390.247 387.650 385.971 418.284 414.302
          8388608 279.799 388.960 385.093 383.170 414.179 412.095
          16777216 279.473 388.359 384.629 382.737 413.717 411.523
          • java.version = 1.6.0_14
            java.runtime.name = Java(TM) SE Runtime Environment
            java.runtime.version = 1.6.0_14-b08
            java.vm.version = 14.0-b16
            java.vm.vendor = Sun Microsystems Inc.
            java.vm.name = Java HotSpot(TM) Server VM
            java.vm.specification.version = 1.0
            java.specification.version = 1.6
            os.arch = x86
            os.name = Windows XP
            os.version = 5.1

          Performance Table (The unit is MB/sec)

          Num Bytes CRC32 PureJavaCrc32 Crc32_8_8b Crc32_8_8b2 Crc32_8_8d Crc32_8_8d2
          1 3.744 49.901 47.314 49.920 51.063 45.306
          2 7.397 78.714 64.757 63.904 69.411 69.742
          4 14.129 118.892 116.559 122.589 100.686 120.545
          8 27.169 141.788 178.545 179.537 177.252 176.913
          16 48.864 197.479 199.851 206.187 205.112 197.922
          32 87.972 242.980 252.241 243.120 248.911 230.985
          64 128.459 262.461 272.747 284.252 276.611 264.473
          128 178.257 265.661 305.895 303.167 293.562 286.069
          256 213.740 280.286 307.318 324.065 296.782 297.760
          512 247.829 287.490 334.125 332.208 317.107 310.138
          1024 266.095 289.453 343.593 340.496 322.506 330.703
          2048 284.693 286.926 328.546 338.577 323.831 324.033
          4096 292.326 304.553 354.299 351.903 332.555 334.845
          8192 276.416 286.101 343.856 336.716 327.936 317.006
          16384 284.944 292.414 338.566 333.038 323.622 314.212
          32768 275.270 292.939 339.562 332.072 332.398 294.426
          65536 271.285 294.269 336.092 339.905 327.122 306.583
          131072 288.692 290.441 337.711 312.378 240.516 299.113
          262144 288.789 290.830 342.317 331.144 322.446 312.372
          524288 277.527 288.056 337.654 329.802 326.141 301.740
          1048576 278.305 288.149 332.875 328.402 320.754 300.196
          2097152 278.992 291.338 324.239 325.872 324.888 294.238
          4194304 282.476 281.645 331.034 332.346 318.677 296.442
          8388608 283.086 287.335 328.317 331.580 325.669 306.567
          16777216 290.031 296.875 339.540 332.101 323.918 307.348
          Show
          Tsz Wo Nicholas Sze added a comment - > I tried this before but it did not help. ... I should be more clear: it did help for the 4-byte case but it hurt the performance in some other cases. c6166_20090819.patch: added Crc32_8_8b2 and Crc32_8_8d2. See whether anyone wants to play with them. Here is my results. java.version = 1.6.0_15 java.runtime.name = Java(TM) SE Runtime Environment java.runtime.version = 1.6.0_15-b03 java.vm.version = 14.1-b02 java.vm.vendor = Sun Microsystems Inc. java.vm.name = Java HotSpot(TM) 64-Bit Server VM java.vm.specification.version = 1.0 java.specification.version = 1.6 os.arch = amd64 os.name = Linux os.version = 2.6.9-55.ELsmp Performance Table (The unit is MB/sec) Num Bytes CRC32 PureJavaCrc32 Crc32_8_8b Crc32_8_8b2 Crc32_8_8d Crc32_8_8d2 1 7.236 106.903 58.534 61.106 109.742 67.673 2 14.605 104.665 104.573 104.952 115.005 102.944 4 27.566 177.502 223.599 228.648 131.890 218.530 8 49.708 193.198 196.794 175.006 251.469 204.796 16 83.804 259.008 234.906 221.473 258.878 236.937 32 128.918 313.157 299.622 273.306 322.913 300.376 64 176.797 349.738 339.404 320.543 365.522 348.328 128 217.087 370.589 362.759 353.613 390.209 384.627 256 244.869 381.630 376.096 369.157 405.767 401.649 512 261.282 385.554 382.974 378.841 413.286 411.211 1024 271.205 390.170 384.064 384.586 415.817 413.505 2048 276.269 392.448 386.215 387.585 418.559 417.466 4096 279.280 393.668 387.571 388.814 418.777 419.686 8192 280.938 394.176 388.399 389.576 420.204 420.571 16384 281.385 393.164 389.518 388.856 420.236 420.392 32768 281.442 391.935 389.115 387.306 419.440 417.362 65536 281.605 391.986 389.124 387.297 419.405 417.123 131072 281.334 393.221 389.093 384.810 417.836 415.334 262144 280.363 391.701 386.141 385.937 417.907 415.452 524288 280.373 391.727 387.646 385.911 417.851 415.510 1048576 280.295 391.585 387.535 385.745 417.587 414.222 2097152 280.866 391.217 387.413 385.603 417.605 414.120 4194304 280.273 390.247 387.650 385.971 418.284 414.302 8388608 279.799 388.960 385.093 383.170 414.179 412.095 16777216 279.473 388.359 384.629 382.737 413.717 411.523 java.version = 1.6.0_14 java.runtime.name = Java(TM) SE Runtime Environment java.runtime.version = 1.6.0_14-b08 java.vm.version = 14.0-b16 java.vm.vendor = Sun Microsystems Inc. java.vm.name = Java HotSpot(TM) Server VM java.vm.specification.version = 1.0 java.specification.version = 1.6 os.arch = x86 os.name = Windows XP os.version = 5.1 Performance Table (The unit is MB/sec) Num Bytes CRC32 PureJavaCrc32 Crc32_8_8b Crc32_8_8b2 Crc32_8_8d Crc32_8_8d2 1 3.744 49.901 47.314 49.920 51.063 45.306 2 7.397 78.714 64.757 63.904 69.411 69.742 4 14.129 118.892 116.559 122.589 100.686 120.545 8 27.169 141.788 178.545 179.537 177.252 176.913 16 48.864 197.479 199.851 206.187 205.112 197.922 32 87.972 242.980 252.241 243.120 248.911 230.985 64 128.459 262.461 272.747 284.252 276.611 264.473 128 178.257 265.661 305.895 303.167 293.562 286.069 256 213.740 280.286 307.318 324.065 296.782 297.760 512 247.829 287.490 334.125 332.208 317.107 310.138 1024 266.095 289.453 343.593 340.496 322.506 330.703 2048 284.693 286.926 328.546 338.577 323.831 324.033 4096 292.326 304.553 354.299 351.903 332.555 334.845 8192 276.416 286.101 343.856 336.716 327.936 317.006 16384 284.944 292.414 338.566 333.038 323.622 314.212 32768 275.270 292.939 339.562 332.072 332.398 294.426 65536 271.285 294.269 336.092 339.905 327.122 306.583 131072 288.692 290.441 337.711 312.378 240.516 299.113 262144 288.789 290.830 342.317 331.144 322.446 312.372 524288 277.527 288.056 337.654 329.802 326.141 301.740 1048576 278.305 288.149 332.875 328.402 320.754 300.196 2097152 278.992 291.338 324.239 325.872 324.888 294.238 4194304 282.476 281.645 331.034 332.346 318.677 296.442 8388608 283.086 287.335 328.317 331.580 325.669 306.567 16777216 290.031 296.875 339.540 332.101 323.918 307.348
          Tsz Wo Nicholas Sze made changes -
          Attachment c6166_20090819.patch [ 12417050 ]
          Hide
          Tsz Wo Nicholas Sze added a comment -

          c6166_20090819review.patch: patch for reviewing.

          • replaced PureJavaCrc32 implementation with 8_8d
          • updated TestPureJavaCrc32 to junit 4
          • added TestPureJavaCrc32.Table
          • changed TestPureJavaCrc32.PerformanceTest to print system information
          Show
          Tsz Wo Nicholas Sze added a comment - c6166_20090819review.patch: patch for reviewing. replaced PureJavaCrc32 implementation with 8_8d updated TestPureJavaCrc32 to junit 4 added TestPureJavaCrc32.Table changed TestPureJavaCrc32.PerformanceTest to print system information
          Tsz Wo Nicholas Sze made changes -
          Attachment c6166_20090819review.patch [ 12417054 ]
          Tsz Wo Nicholas Sze made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Affects Version/s 0.21.0 [ 12313563 ]
          Fix Version/s 0.21.0 [ 12313563 ]
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12417054/c6166_20090819review.patch
          against trunk revision 804918.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/615/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/615/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/615/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/615/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12417054/c6166_20090819review.patch against trunk revision 804918. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/615/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/615/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/615/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/615/console This message is automatically generated.
          Hide
          Chris Douglas added a comment -

          +1

          Very impressive work. Thanks Nicholas, Todd, and Scott!

          Show
          Chris Douglas added a comment - +1 Very impressive work. Thanks Nicholas, Todd, and Scott!
          Chris Douglas made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in TestBuilds #6 (See http://hudson.zones.apache.org/hudson/job/TestBuilds/6/)
          . Further improve the performance of the pure-Java CRC32
          implementation. Contributed by Tsz Wo (Nicholas), SZE

          Show
          Hudson added a comment - Integrated in TestBuilds #6 (See http://hudson.zones.apache.org/hudson/job/TestBuilds/6/ ) . Further improve the performance of the pure-Java CRC32 implementation. Contributed by Tsz Wo (Nicholas), SZE
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #70 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/70/)
          . Further improve the performance of the pure-Java CRC32
          implementation. Contributed by Tsz Wo (Nicholas), SZE

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #70 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/70/ ) . Further improve the performance of the pure-Java CRC32 implementation. Contributed by Tsz Wo (Nicholas), SZE
          Tom White made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Suresh Srinivas made changes -
          Component/s performance [ 12316502 ]
          Brandon Li made changes -
          Link This issue is related to HADOOP-8617 [ HADOOP-8617 ]

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Tsz Wo Nicholas Sze
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development