Avro
  1. Avro
  2. AVRO-27

Consistent Overhead Byte Stuffing (COBS) encoded block format for Object Container Files

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Later
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: spec
    • Labels:
      None

      Description

      Object Container Files could use a 1 byte sync marker (set to zero) using zig-zag and COBS encoding within blocks to efficiently escape zeros from the record data.

      Zig-Zag encoding

      With zig-zag encoding only the value of 0 (zero) gets encoded into a value with a single zero byte. This property means that we can write any non-zero zig-zag long inside a block within concern for creating an unintentional sync byte.

      COBS encoding

      We'll use COBS encoding to ensure that all zeros are escaped inside the block payload. You can read http://www.sigcomm.org/sigcomm97/papers/p062.pdf for the details about COBS encoding.

      Block Format

      All blocks start and end with a sync byte (set to zero) with a type-length-value format internally as follows:

      name format length in bytes value meaning
      sync byte 1 always 0 (zero) The sync byte serves as a clear marker for the start of a block
      type zig-zag long variable must be non-zero The type field expresses whether the block is for metadata or normal data.
      length zig-zag long variable must be non-zero The length field expresses the number of bytes until the next record (including the cobs code and sync byte). Useful for skipping ahead to the next block.
      cobs_code byte 1 see COBS code table below Used in escaping zeros from the block payload
      payload cobs-encoded Greater than or equal to zero all non-zero bytes The payload of the block
      sync byte 1 always 0 (zero) The sync byte serves as a clear marker for the end of the block

      COBS code table

      Code Followed by Meaning
      0x00 (not applicable) (not allowed )
      0x01 nothing Empty payload followed by the closing sync byte
      0x02 one data byte The single data byte, followed by the closing sync byte
      0x03 two data bytes The pair of data bytes, followed by the closing sync byte
      0x04 three data bytes The three data bytes, followed by the closing sync byte
      n (n-1) data bytes The (n-1) data bytes, followed by the closing sync byte
      0xFD 252 data bytes The 252 data bytes, followed by the closing sync byte
      0xFE 253 data bytes The 253 data bytes, followed by the closing sync byte
      0xFF 254 data bytes The 254 data bytes not followed by a zero.

      (taken from http://www.sigcomm.org/sigcomm97/papers/p062.pdf)

      Encoding

      Only the block writer needs to perform byte-by-byte processing to encode the block. The overhead for COBS encoding is very small in terms of the in-memory state required.

      Decoding

      Block readers are not required to do as much byte-by-byte processing as a writer. The reader could (for example) find a metadata block by doing the following:

      1. Search for a zero byte in the file which marks the start of a record
      2. Read and zig-zag decode the type of the block
        • If the block is normal data, read the length, seek ahead to the next block and goto step #2 again
        • If the block is a metadata block, cobs decode the data
      1. COBSCodec.java
        8 kB
        Matt Massie
      2. COBSCodec2.java
        10 kB
        Scott Carey
      3. COWSCodec.java
        9 kB
        Scott Carey
      4. COWSCodec2.java
        8 kB
        Scott Carey
      5. COWSCodec3.java
        8 kB
        Scott Carey
      6. COLSCodec.java
        8 kB
        Scott Carey
      7. COBSPerfTest.java
        0.6 kB
        Scott Carey
      8. COLSCodec2.java
        8 kB
        Scott Carey

        Activity

        Hide
        Scott Carey added a comment - - edited

        An outsider here – I've got an idea on how to avoid the performance pitfalls of COBS' byte-by-byte nature and as I thought through it, I spotted many other opportunities for enhancement since larger chunks afford a lot more bits in the Code that can be used for things other than the length of the following literal chunk.

        Proposal – COLS, a modification of COBS

        (for greater performance and extensibility for large data streams)

        Java is particularly bad at byte-by-byte operations. The COBS paper clearly indicates its design intention was stuffing data through embedded systems such as telephone lines and other networks where byte-by-byte processing of the whole payload is already mandatory.

        Doing so here would be a performance bottleneck in Java. Some simple tests can be constructed to prove or disprove this claim.

        I propose that rather than use COBS, one uses COLS or COWS ... that is Constant Overhead Long Stuffing or Constant Overhead Word Stuffing instead.

        This would be inefficient if we expect most payloads to be small (less than 256 bytes), but I suspect most hadoop related payloads to be large, and often very large.

        I favor stuffing Longs rather than Ints, since most systems will soon be running 64 bit JVMs in the future. Sun's next JRE release has Object Pointer Compression, which makes the memory overhead of a 64 bit JVM very small compared to a 32 bit JVM, and performance is generally faster than the 32 bit JVM due to native 64 bit operations and more registers (for x86-64 at least).
        http://blog.juma.me.uk/2008/10/14/32-bit-or-64-bit-jvm-how-about-a-hybrid/

        I will describe the proposal below assuming a translation of COBS to COLS, from 1 byte at a time to 8 byte at a time encoding. However, it is clear that a 4 byte variant is very similar and may be preferable.

        Proposed Changes – Simple Block format with COLS

        name format length in bytes value meaning
        sync byte 8 0L The sync long serves as a clear marker for the start of a block
        type 1 byte 1 non-zero The type field expresses whether the block is for metadata or normal data. note - if this is only ever going to be a binary flag, it can be packed into the length or sequence number as a sign value. However, it is decoding performance critical to keep the non-COLS header 8 byte aligned
        block sequence number 3 byte unsigned int 3 bytes 0 - 2^24 the block sequence number – a client can use this to resume a stream from the last successful block. This may not be needed if the metadata blocks take care of this.
        length fixed 4 byte signed int variable >= 0L The length field expresses the number of bytes of COLS_payload data in bytes. Useful for skipping ahead to the next block.
        COLS_payload COLS length as above see COLS description below The data in this block, encoded.

        The above would put cap the stream length to 2GB * 16M = 32PB. There is room to increase this significantly by taking bits from the type and giving those to the block count. 2GB blocks are rather unlikely for now however – as is multi-PB streams.

        Discussion

        • The entire stream would need to be 8 byte aligned in order to process it cleanly with something like java.nio.LongBuffer. This would include metadata blocks.
        • The sequence is assumed to be in network-order. Endianness can be handled and is not discussed in detail here.
        • The type can likely be encoded in a single bit in the block sequence number or length field. If more than two types of blocks are expected, more bits can be reserved for future use.
        • The length can be stored as the number of longs rather than bytes (bytes / 8) since the COLS payload is a multiple of 8 bytes.
        • The COLS payload here differs from the original proposal. It will have an entire COBS-like stream, with possibly many COLS code markers (at least one per 0L value in the block data).
        • One may want to have both the encoded length above, and the decoded length (or a checksum) as extra data validation. Perhaps even 4 types: METADATA, METADATA_CSUM, NORMAL, NORMAL_CSUM – where the ordinary variants store the length (fast, but less reliable) and the _CSUM variants store a checksum (slower, but highly reliable).

        Basic COBS to COLS description

        COBS describes a byte-by-byte encoding where a zero byte cannot exist, and a set of codes are used to encode runs of data that does not contain a zero byte. All codes but but one have an implicit trailing zero. The last block is assumed to have no implicit zero regardless of the code.

        COLS is a simple extension of this scheme to 64 bit chunks. In its base form, it does nothing more than work with larger chunks:

        COLS Code (Long, 8 bytes) Followed by Meaning
        0L N/A (not allowed)
        1L nothing A single zero Long
        2L one long (8 bytes) The single data long, followed by a trailing zero long *
        3L two longs (16 bytes) The pair of data longs, followed by a trailing zero long *
        nL (n-1) longs The (n-1) longs, followed by a trailing zero long *
        MAX ** MAX - 1 longs MAX -1 longs, with no trailing zero

        * The last code in the sequence (which can be identified by the length header or a 0L indicating the start of the next block) does NOT have an implicit trailing zero.
        ** MAX needs to be chosen, and can't realistically be very large since encoding requires an arraycopy of size (MAX -1) * 8

        The COLS_payload has multiple COLS Code entries (and literals), up to the length specified in the header (where a 0L should then occur).

        However – there are drawbacks to using such a large chunk without other modifications from COBS:

        1. 64 bits is far too large for a length field. For encoding, a COBS code block must fit in RAM, and for performance, should probably fit in half an L2 cache. However, for decoding COLS code length is irrelevant.
        2. If the size of the data encoded is not a multiple of 8 bytes, we need a mechanism to encode that up to 7 trailing bytes should be truncated (3 bits).
        3. For most blocks, the overhead will be exactly 8 bytes (unless the block has a trailing 0L).
        4. Very long data streams without a zero Long are unlikely, so very large chunk lengths are not very useful.

        There are also benefits however. The above suggests that most of the 8 byte COLS code block space is not needed to encode length. Much can be done with this!
        Some thoughts:

        • The 3 bits needed to define the truncation behavior can be stored in the COLS code.
        • The overhead can be reduced, by encoding short trailing sequences into the upper bits rather than purely truncating – e.g. you can append 2 bytes instead of truncating 6.
        • Rudimentary run-length encoding or other light weight compression can be done with the extra bits (completely encoder-optional).
        • We can remove the requirement that most codes have an implicit trailing zero, and encode that in one of the extra bits.

        If only the lower 2 bytes of an 8 byte COLS code represent the size, (MAX = 2^16 - 1), then the max literal size is 512KB - 8B. If we remove the implicit trailing zero, an encoder can optionally encode smaller literal sequences (perhaps for performance, or compression).
        What can be done with the remaining 48 bits?
        Some ideas:

        1. The highest 4 bytes can represent data to append to the literal. In this way, half of the size overhead of the encoding is removed. This should generally only apply to the last COLS code in the block (for performance reasons and maintaining 8 byte alignment on all arraycopy operations, but its encoder optional).
        2. the next bit represents whether the COLS block has an implicit 0L appended.
        3. a bit can be used to signify endianness (this might be a better fit for the Block header or stream metadata – detecting zero's works without known endianness)
        4. The next three bits can represent how much data is truncated or appended to the literal, (before the optional implicit 0L):
        value meaning
        000 do not truncate or append
        100 append all 4 leading bytes in the COLS code after the literal
        111 append the first 3 leading bytes in the COLS code after the literal
        110 append the first 2 leading bytes in the COLS code after the literal
        101 append the leading byte in the COLS code after the literal
        011 truncate the last 3 bytes of the literal
        010 truncate the last 2 bytes of the literal
        001 truncate the last byte of the literal

        This leaves us with 12 bits. I propose that these be used for rudimentary (optional) compression:

        • Option A:
          • Run length only – the 12 bits represent the number of times to repeat the literal. Or 4 bits are the number of COLS chunks backwards (including this one) to repeat, and 8 bits is the number of repeats. Or ... some other form of emitting copies of entire COLS chunks.
        • Option B:
          • Some form of LZ-like compression that copies in 8 byte chunks – 4 bits represent the number of Longs to copy (so, max match size is 15 * 8 bytes), and 8 bits represents the number of Longs backwards (from the end of this COLS chunk) to begin that copy (up to 2KB). Because of the truncation/append feature, this is not constrained to 8-byte aligned copies on the output, but the encoded format is entirely 8 byte aligned and all copies are multiples of 8 bytes. I would not be surprised if this was as fast as LZO or faster, since it is very similar but operates in a more chunky fashion. Compression levels would not be that great, but like most similar algorithms to this the encoder can do more work to search for matches. Decoding uncompressed data should be essentially free (if the 4 bits are 0, do nothing – and most COLS blocks would be fairly large so this check does not occur that frequently).
        • Option C:
          • Reserve those 12 bits for future use / research

        Alternatively, one to 4 extra bytes used for the "append" feature can be reassigned to have more than 12 bits for compression metadata.

        So, with the above modifications, the COLS code looks like this:

        The COLS code is 8 bytes. The low 16 bits encode basic meaning.
        An 8 byte COLS code cannot be 0L.

        Code & 0xFFFF (low 2 bytes) Followed by Meaning
        0x0000 N/A (not allowed)
        0x0001 nothing A single zero Long
        0x0002 one long (8 bytes) The single data long
        0x0003 two longs (16 bytes) The pair of data longs
        n (n-1) longs The (n-1) longs
        0xFFFF 2^16 - 2 longs 2^16 - 2 longs

        The next portion, is to determine the state of truncation or appending.
        Two options are listed – only truncation, and truncation/appending. The appending could be up to 5 bytes if we squeeze all the rest of the space. The example below is for up to 4 bytes appended and 3 bytes truncated.

        appendCode = (Code >> 28) & 0xF;
        appendCode & 0x7 Append or truncate From truncate only option
        0x0 0 nothing 0
        0x1 (-)1 nothing (-)1
        0x2 (-)2 nothing (-)2
        0x3 (-)3 nothing (-)3
        0x4 (+)1 Code >>> 56 (-)4
        0x5 (+)2 Code >>> 48 (-)5
        0x6 (+)3 Code >>> 40 (-)6
        0x7 (+)4 Code >>> 32 (-)7

        It may be wiser to choose an option between these. If 3 bytes are chosen as the max arbitrary append length, with 4 truncated, 20 bits are left for other purposes, rather than 12. The average COLS chunk would be one byte larger.

        AppendCode & 0x8 Append 0L
        0 do not append 0L (8 zero bytes)
        1 do append 0L (8 zero bytes)

        Encoding

        The writer would perform processing in 8 byte chunks until the end of the block where some byte-by-byte processing would occur. Compression options would be entirely the writer's choice.
        The state overhead can be very low or large at the writer's whim. Larger COLS chunk sizes require more state (and larger arraycopys), and any compression option adds state overhead.

        Decoding

        Decoding in all circumstances reads data in 8 byte chunks. Copies occur in 8 byte chunks, 8 byte aligned save for the end of a block if the block does not have a multiple of 8 bytes in its payload. An encoder can cause copy destinations (but not sources) to not be 8 byte aligned if certain special options (compression) or intentionally misaligned encoding is done. Generally, an encoder can choose to make all but the last few bytes of the last block in the stream aligned.

        Show
        Scott Carey added a comment - - edited An outsider here – I've got an idea on how to avoid the performance pitfalls of COBS' byte-by-byte nature and as I thought through it, I spotted many other opportunities for enhancement since larger chunks afford a lot more bits in the Code that can be used for things other than the length of the following literal chunk. Proposal – COLS, a modification of COBS (for greater performance and extensibility for large data streams) Java is particularly bad at byte-by-byte operations. The COBS paper clearly indicates its design intention was stuffing data through embedded systems such as telephone lines and other networks where byte-by-byte processing of the whole payload is already mandatory. Doing so here would be a performance bottleneck in Java. Some simple tests can be constructed to prove or disprove this claim. I propose that rather than use COBS, one uses COLS or COWS ... that is Constant Overhead Long Stuffing or Constant Overhead Word Stuffing instead. This would be inefficient if we expect most payloads to be small (less than 256 bytes), but I suspect most hadoop related payloads to be large, and often very large. I favor stuffing Longs rather than Ints, since most systems will soon be running 64 bit JVMs in the future. Sun's next JRE release has Object Pointer Compression, which makes the memory overhead of a 64 bit JVM very small compared to a 32 bit JVM, and performance is generally faster than the 32 bit JVM due to native 64 bit operations and more registers (for x86-64 at least). http://blog.juma.me.uk/2008/10/14/32-bit-or-64-bit-jvm-how-about-a-hybrid/ I will describe the proposal below assuming a translation of COBS to COLS, from 1 byte at a time to 8 byte at a time encoding. However, it is clear that a 4 byte variant is very similar and may be preferable. Proposed Changes – Simple Block format with COLS name format length in bytes value meaning sync byte 8 0L The sync long serves as a clear marker for the start of a block type 1 byte 1 non-zero The type field expresses whether the block is for metadata or normal data. note - if this is only ever going to be a binary flag, it can be packed into the length or sequence number as a sign value. However, it is decoding performance critical to keep the non-COLS header 8 byte aligned block sequence number 3 byte unsigned int 3 bytes 0 - 2^24 the block sequence number – a client can use this to resume a stream from the last successful block. This may not be needed if the metadata blocks take care of this. length fixed 4 byte signed int variable >= 0L The length field expresses the number of bytes of COLS_payload data in bytes. Useful for skipping ahead to the next block. COLS_payload COLS length as above see COLS description below The data in this block, encoded. The above would put cap the stream length to 2GB * 16M = 32PB. There is room to increase this significantly by taking bits from the type and giving those to the block count. 2GB blocks are rather unlikely for now however – as is multi-PB streams. Discussion The entire stream would need to be 8 byte aligned in order to process it cleanly with something like java.nio.LongBuffer. This would include metadata blocks. The sequence is assumed to be in network-order. Endianness can be handled and is not discussed in detail here. The type can likely be encoded in a single bit in the block sequence number or length field. If more than two types of blocks are expected, more bits can be reserved for future use. The length can be stored as the number of longs rather than bytes (bytes / 8) since the COLS payload is a multiple of 8 bytes. The COLS payload here differs from the original proposal. It will have an entire COBS-like stream, with possibly many COLS code markers (at least one per 0L value in the block data). One may want to have both the encoded length above, and the decoded length (or a checksum) as extra data validation. Perhaps even 4 types: METADATA, METADATA_CSUM, NORMAL, NORMAL_CSUM – where the ordinary variants store the length (fast, but less reliable) and the _CSUM variants store a checksum (slower, but highly reliable). Basic COBS to COLS description COBS describes a byte-by-byte encoding where a zero byte cannot exist, and a set of codes are used to encode runs of data that does not contain a zero byte. All codes but but one have an implicit trailing zero. The last block is assumed to have no implicit zero regardless of the code. COLS is a simple extension of this scheme to 64 bit chunks. In its base form, it does nothing more than work with larger chunks: COLS Code (Long, 8 bytes) Followed by Meaning 0L N/A (not allowed) 1L nothing A single zero Long 2L one long (8 bytes) The single data long, followed by a trailing zero long * 3L two longs (16 bytes) The pair of data longs, followed by a trailing zero long * nL (n-1) longs The (n-1) longs, followed by a trailing zero long * MAX ** MAX - 1 longs MAX -1 longs, with no trailing zero * The last code in the sequence (which can be identified by the length header or a 0L indicating the start of the next block) does NOT have an implicit trailing zero. ** MAX needs to be chosen, and can't realistically be very large since encoding requires an arraycopy of size (MAX -1) * 8 The COLS_payload has multiple COLS Code entries (and literals), up to the length specified in the header (where a 0L should then occur). However – there are drawbacks to using such a large chunk without other modifications from COBS: 64 bits is far too large for a length field. For encoding, a COBS code block must fit in RAM, and for performance, should probably fit in half an L2 cache. However, for decoding COLS code length is irrelevant. If the size of the data encoded is not a multiple of 8 bytes, we need a mechanism to encode that up to 7 trailing bytes should be truncated (3 bits). For most blocks, the overhead will be exactly 8 bytes (unless the block has a trailing 0L). Very long data streams without a zero Long are unlikely, so very large chunk lengths are not very useful. There are also benefits however. The above suggests that most of the 8 byte COLS code block space is not needed to encode length. Much can be done with this! Some thoughts: The 3 bits needed to define the truncation behavior can be stored in the COLS code. The overhead can be reduced, by encoding short trailing sequences into the upper bits rather than purely truncating – e.g. you can append 2 bytes instead of truncating 6. Rudimentary run-length encoding or other light weight compression can be done with the extra bits (completely encoder-optional). We can remove the requirement that most codes have an implicit trailing zero, and encode that in one of the extra bits. If only the lower 2 bytes of an 8 byte COLS code represent the size, (MAX = 2^16 - 1), then the max literal size is 512KB - 8B. If we remove the implicit trailing zero, an encoder can optionally encode smaller literal sequences (perhaps for performance, or compression). What can be done with the remaining 48 bits? Some ideas: The highest 4 bytes can represent data to append to the literal. In this way, half of the size overhead of the encoding is removed. This should generally only apply to the last COLS code in the block (for performance reasons and maintaining 8 byte alignment on all arraycopy operations, but its encoder optional). the next bit represents whether the COLS block has an implicit 0L appended. a bit can be used to signify endianness (this might be a better fit for the Block header or stream metadata – detecting zero's works without known endianness) The next three bits can represent how much data is truncated or appended to the literal, (before the optional implicit 0L): value meaning 000 do not truncate or append 100 append all 4 leading bytes in the COLS code after the literal 111 append the first 3 leading bytes in the COLS code after the literal 110 append the first 2 leading bytes in the COLS code after the literal 101 append the leading byte in the COLS code after the literal 011 truncate the last 3 bytes of the literal 010 truncate the last 2 bytes of the literal 001 truncate the last byte of the literal This leaves us with 12 bits. I propose that these be used for rudimentary (optional) compression: Option A: Run length only – the 12 bits represent the number of times to repeat the literal. Or 4 bits are the number of COLS chunks backwards (including this one) to repeat, and 8 bits is the number of repeats. Or ... some other form of emitting copies of entire COLS chunks. Option B: Some form of LZ-like compression that copies in 8 byte chunks – 4 bits represent the number of Longs to copy (so, max match size is 15 * 8 bytes), and 8 bits represents the number of Longs backwards (from the end of this COLS chunk) to begin that copy (up to 2KB). Because of the truncation/append feature, this is not constrained to 8-byte aligned copies on the output, but the encoded format is entirely 8 byte aligned and all copies are multiples of 8 bytes. I would not be surprised if this was as fast as LZO or faster, since it is very similar but operates in a more chunky fashion. Compression levels would not be that great, but like most similar algorithms to this the encoder can do more work to search for matches. Decoding uncompressed data should be essentially free (if the 4 bits are 0, do nothing – and most COLS blocks would be fairly large so this check does not occur that frequently). Option C: Reserve those 12 bits for future use / research Alternatively, one to 4 extra bytes used for the "append" feature can be reassigned to have more than 12 bits for compression metadata. So, with the above modifications, the COLS code looks like this: The COLS code is 8 bytes. The low 16 bits encode basic meaning. An 8 byte COLS code cannot be 0L. Code & 0xFFFF (low 2 bytes) Followed by Meaning 0x0000 N/A (not allowed) 0x0001 nothing A single zero Long 0x0002 one long (8 bytes) The single data long 0x0003 two longs (16 bytes) The pair of data longs n (n-1) longs The (n-1) longs 0xFFFF 2^16 - 2 longs 2^16 - 2 longs The next portion, is to determine the state of truncation or appending. Two options are listed – only truncation, and truncation/appending. The appending could be up to 5 bytes if we squeeze all the rest of the space. The example below is for up to 4 bytes appended and 3 bytes truncated. appendCode = (Code >> 28) & 0xF; appendCode & 0x7 Append or truncate From truncate only option 0x0 0 nothing 0 0x1 (-)1 nothing (-)1 0x2 (-)2 nothing (-)2 0x3 (-)3 nothing (-)3 0x4 (+)1 Code >>> 56 (-)4 0x5 (+)2 Code >>> 48 (-)5 0x6 (+)3 Code >>> 40 (-)6 0x7 (+)4 Code >>> 32 (-)7 It may be wiser to choose an option between these. If 3 bytes are chosen as the max arbitrary append length, with 4 truncated, 20 bits are left for other purposes, rather than 12. The average COLS chunk would be one byte larger. AppendCode & 0x8 Append 0L 0 do not append 0L (8 zero bytes) 1 do append 0L (8 zero bytes) Encoding The writer would perform processing in 8 byte chunks until the end of the block where some byte-by-byte processing would occur. Compression options would be entirely the writer's choice. The state overhead can be very low or large at the writer's whim. Larger COLS chunk sizes require more state (and larger arraycopys), and any compression option adds state overhead. Decoding Decoding in all circumstances reads data in 8 byte chunks. Copies occur in 8 byte chunks, 8 byte aligned save for the end of a block if the block does not have a multiple of 8 bytes in its payload. An encoder can cause copy destinations (but not sources) to not be 8 byte aligned if certain special options (compression) or intentionally misaligned encoding is done. Generally, an encoder can choose to make all but the last few bytes of the last block in the stream aligned.
        Hide
        Doug Cutting added a comment -

        Before we get to far, let's review the motivation for this. From Matt's message:

        It make more sense to make that we use the same record boundary (0) for all Avro records instead of having them be random. The format would be more resilient to data corruption easier to parse. It's also possible (although improbable) that the 16-byte UUID might be part of the payload... especially given the size of the data Hadoop processes.

        1. What's the tangible advantage of a single record boundary?
        2. Why would this be more corruption resistant?
        3. How likely is a collision? By my reading of http://en.wikipedia.org/wiki/Birthday_attack, we have a ~1% chance of collision in an exabyte (10^18B) of data, roughly 1000 times todays largest datasets, if we used the same marker for the full exabyte, which we would not, since we'd choose a new marker per output partition. Switching to a 32 byte marker would raise this to 10^37B. So we might consider that if we're worried about collisions.
        Show
        Doug Cutting added a comment - Before we get to far, let's review the motivation for this. From Matt's message: It make more sense to make that we use the same record boundary (0) for all Avro records instead of having them be random. The format would be more resilient to data corruption easier to parse. It's also possible (although improbable) that the 16-byte UUID might be part of the payload... especially given the size of the data Hadoop processes. What's the tangible advantage of a single record boundary? Why would this be more corruption resistant? How likely is a collision? By my reading of http://en.wikipedia.org/wiki/Birthday_attack , we have a ~1% chance of collision in an exabyte (10^18B) of data, roughly 1000 times todays largest datasets, if we used the same marker for the full exabyte, which we would not, since we'd choose a new marker per output partition. Switching to a 32 byte marker would raise this to 10^37B. So we might consider that if we're worried about collisions.
        Hide
        Matt Massie added a comment -

        1. What is the tangible advantage of a single record boundary?
        2. Why would this be more corruption resistant?

        I'm imagining a situation where you have part of an Avro Object container file minus the header/footer metablock because of data loss or subscribing to a data stream in "real-time" midstream. In that situation, determining the random 16 byte sync marker would require some work (e.g. finding recurring 16-byte values, searching for the string "schema" and working back, etc). Having a constant sync value (with an escaped payload) makes this recovery easier and the code a little cleaner. To be honest, this point is weakened by the fact that we're not planning on streaming Object container files anyway.

        3. How likely is a collision?

        Seems like this is a non-issue with a 16-byte sync value as it is now but it's always good to be future proof.

        I'm curious what other Java experts (since I'm not) out there feel about COBS in Java . It sounds from Scott's comment that byte stuffing in Java is a non-starter.

        There is code at..

        https://bosshog.lbl.gov/repos/java-u3/trunk/sea/src/gov/lbl/dsd/sea/nio/util/COBSCodec.java

        ...from Lawrence Berkeley Labs to do COBS encoding in Java with the following comment

        /* Performance Note: The JDK 1.5 server VM runs <code>decode(encode(src))</code>
         * at about 125 MB/s throughput on a commodity PC (2 GHz Pentium 4). Encoding is
         * the bottleneck, decoding is extremely cheap. Obviously, this is way more
         * efficient than Base64 encoding or similar application level byte stuffing
         * mechanisms.
         */
        
        Show
        Matt Massie added a comment - 1. What is the tangible advantage of a single record boundary? 2. Why would this be more corruption resistant? I'm imagining a situation where you have part of an Avro Object container file minus the header/footer metablock because of data loss or subscribing to a data stream in "real-time" midstream. In that situation, determining the random 16 byte sync marker would require some work (e.g. finding recurring 16-byte values, searching for the string "schema" and working back, etc). Having a constant sync value (with an escaped payload) makes this recovery easier and the code a little cleaner. To be honest, this point is weakened by the fact that we're not planning on streaming Object container files anyway. 3. How likely is a collision? Seems like this is a non-issue with a 16-byte sync value as it is now but it's always good to be future proof. I'm curious what other Java experts (since I'm not) out there feel about COBS in Java . It sounds from Scott's comment that byte stuffing in Java is a non-starter. There is code at.. https://bosshog.lbl.gov/repos/java-u3/trunk/sea/src/gov/lbl/dsd/sea/nio/util/COBSCodec.java ...from Lawrence Berkeley Labs to do COBS encoding in Java with the following comment /* Performance Note: The JDK 1.5 server VM runs <code>decode(encode(src))</code> * at about 125 MB/s throughput on a commodity PC (2 GHz Pentium 4). Encoding is * the bottleneck, decoding is extremely cheap. Obviously, this is way more * efficient than Base64 encoding or similar application level byte stuffing * mechanisms. */
        Hide
        Scott Carey added a comment -

        I'm curious what other Java experts (since I'm not) out there feel about COBS in Java . It sounds from Scott's comment that byte stuffing in Java is a non-starter.

        That really depends on the performance requirement.

        If the requirement is to be able to encapsulate data and stream at near Gigabit ethernet speed or teamed Gigabit (~100MB/sec to 200MB/sec), it will get in the way.
        If other things already significantly limit streaming capability then it may not be a large incremental overhead.
        For example, if the Avro serialization process is already going byte-by-byte somewhere else, this could 'piggyback' almost for free – but it would have to be embedded in that other code, in the same loop.

        I also want to highlight that the byte-by-byte streaming in Java can be compared to larger chunk sizes with a fairly simple benchmark to validate (or disprove) my claims that it is slow in comparison.

        The data from LBL is useful. It should be fairly easy to change that to a larger chunk size and compare on a new JVM.

        I'll try to characterize this on my own time this weekend.

        Show
        Scott Carey added a comment - I'm curious what other Java experts (since I'm not) out there feel about COBS in Java . It sounds from Scott's comment that byte stuffing in Java is a non-starter. That really depends on the performance requirement. If the requirement is to be able to encapsulate data and stream at near Gigabit ethernet speed or teamed Gigabit (~100MB/sec to 200MB/sec), it will get in the way. If other things already significantly limit streaming capability then it may not be a large incremental overhead. For example, if the Avro serialization process is already going byte-by-byte somewhere else, this could 'piggyback' almost for free – but it would have to be embedded in that other code, in the same loop. I also want to highlight that the byte-by-byte streaming in Java can be compared to larger chunk sizes with a fairly simple benchmark to validate (or disprove) my claims that it is slow in comparison. The data from LBL is useful. It should be fairly easy to change that to a larger chunk size and compare on a new JVM. I'll try to characterize this on my own time this weekend.
        Hide
        Todd Lipcon added a comment -

        If the Java performance of byte-by-byte processing is the major issue, is it worth considering native code to optimize this? I don't generally like using native code, but I feel like it may be worth it if the advantages of COBS are significant enough.

        On a side note, I recently read a paper that added a JVM optimization to really improve element-by-element processing of arrays by automatically eliminating bounds checking. I imagine that would apply here. Unfortunately, basing a system around a JVM that doesn't exist yet isn't so wise But down the road this performance issue may be ameliorated.

        Show
        Todd Lipcon added a comment - If the Java performance of byte-by-byte processing is the major issue, is it worth considering native code to optimize this? I don't generally like using native code, but I feel like it may be worth it if the advantages of COBS are significant enough. On a side note, I recently read a paper that added a JVM optimization to really improve element-by-element processing of arrays by automatically eliminating bounds checking. I imagine that would apply here. Unfortunately, basing a system around a JVM that doesn't exist yet isn't so wise But down the road this performance issue may be ameliorated.
        Hide
        Matt Massie added a comment -

        The suspense was just killing me so I had to get some benchmarks myself.

        Scott, I'll be interested to see if you have similar results over the weekend.

        I rewrote the LBL code to use ByteBuffers instead of ArrayByteList from the older Apache commons primitives. The new API looks like...

        public static void decode(ByteBuffer src, int from, int to, ByteBuffer dest) throws IOException
        public static void encode(ByteBuffer src, int from, int to, ByteBuffer dest)
        

        I chose ByteBuffers because I didn't want to realloc new byte arrays but instead operate on the same byte array for each test.

        My test results are the average of 10 tests run on a 64 MB ByteBuffer running on my MacBook Pro

          Model Name:	MacBook Pro
          Model Identifier:	MacBookPro5,1
          Processor Name:	Intel Core 2 Duo
          Processor Speed:	2.4 GHz
          Number Of Processors:	1
          Total Number Of Cores:	2
          L2 Cache:	3 MB
          Memory:	4 GB
          Bus Speed:	1.07 GHz
        

        Since my test wasn't multithreaded... only one core was used.

        My tests verified that the byte array wasn't altered by the encoding/decoding process (there were no failures).

        These number are meant to be ballpark values since my MacBook was "quiet" during the tests... I was cranking some Radiohead on iTunes.

        One of the factors that can effect the speed of COBS is the number of zeros you need to encode/decode. In the worse case, you are encoding nothing but zeros. In that case, you'll essentially be replace all zeros with ones.

        The results from this worse case (nothing but zeros) are as follows...

        Encoding at 38.22 MB/sec
        Decoding at 17.85 MB/sec

        If we have one zero every 10 bytes...

        Encoding at 57.26 MB/sec
        Decoding at 151.91 MB/sec

        If you have one zero every 100 bytes...

        Encoding at 74.81 MB/sec
        Decoding at 846.56 MB/sec

        If you have one zero every 1000 bytes...

        Encoding at 73.70 MB/sec
        Decoding at 1128.75 MB/sec

        If you have one zero every 10,000 bytes...

        Encoding at 74.40 MB/sec
        Decoding at 1118.88 MB/sec

        If you have no zeros at all...

        Encoding at 73.98 MB/sec
        Decoding at 1151.08 MB/sec

        So it looks to me like... even with native Java code... we'll be able to push ~100MB/sec - 200MB/sec... (except for the worse case where we have 64MB of zeros).

        I'll post my code to this Jira so others can point and laugh.

        Show
        Matt Massie added a comment - The suspense was just killing me so I had to get some benchmarks myself. Scott, I'll be interested to see if you have similar results over the weekend. I rewrote the LBL code to use ByteBuffers instead of ArrayByteList from the older Apache commons primitives. The new API looks like... public static void decode(ByteBuffer src, int from, int to, ByteBuffer dest) throws IOException public static void encode(ByteBuffer src, int from, int to, ByteBuffer dest) I chose ByteBuffers because I didn't want to realloc new byte arrays but instead operate on the same byte array for each test. My test results are the average of 10 tests run on a 64 MB ByteBuffer running on my MacBook Pro Model Name: MacBook Pro Model Identifier: MacBookPro5,1 Processor Name: Intel Core 2 Duo Processor Speed: 2.4 GHz Number Of Processors: 1 Total Number Of Cores: 2 L2 Cache: 3 MB Memory: 4 GB Bus Speed: 1.07 GHz Since my test wasn't multithreaded... only one core was used. My tests verified that the byte array wasn't altered by the encoding/decoding process (there were no failures). These number are meant to be ballpark values since my MacBook was "quiet" during the tests... I was cranking some Radiohead on iTunes. One of the factors that can effect the speed of COBS is the number of zeros you need to encode/decode. In the worse case, you are encoding nothing but zeros. In that case, you'll essentially be replace all zeros with ones. The results from this worse case (nothing but zeros) are as follows... Encoding at 38.22 MB/sec Decoding at 17.85 MB/sec If we have one zero every 10 bytes... Encoding at 57.26 MB/sec Decoding at 151.91 MB/sec If you have one zero every 100 bytes... Encoding at 74.81 MB/sec Decoding at 846.56 MB/sec If you have one zero every 1000 bytes... Encoding at 73.70 MB/sec Decoding at 1128.75 MB/sec If you have one zero every 10,000 bytes... Encoding at 74.40 MB/sec Decoding at 1118.88 MB/sec If you have no zeros at all... Encoding at 73.98 MB/sec Decoding at 1151.08 MB/sec So it looks to me like... even with native Java code... we'll be able to push ~100MB/sec - 200MB/sec... (except for the worse case where we have 64MB of zeros). I'll post my code to this Jira so others can point and laugh.
        Hide
        Matt Massie added a comment -

        This is the Java code that I used for my benchmarks of COBS encoding/decoding

        Show
        Matt Massie added a comment - This is the Java code that I used for my benchmarks of COBS encoding/decoding
        Hide
        Matt Massie added a comment -

        Sorry for spamming so many comments here.

        I forgot to mention that I used the standard JVM 1.5.0 for MacOS for the tests.

        Show
        Matt Massie added a comment - Sorry for spamming so many comments here. I forgot to mention that I used the standard JVM 1.5.0 for MacOS for the tests.
        Hide
        Todd Lipcon added a comment -

        It turns out the paper I read has been implemented in JDK 7. If someone has this mythical beast installed, it would be very interesting to see the results of Matt's benchmark code.

        Here's a link to someone else's experiences with it:

        http://lingpipe-blog.com/2009/03/30/jdk-7-twice-as-fast-as-jdk-6-for-arrays-and-arithmetic/

        Whether relying on optimizations only available in a not-yet-released JVM is a good idea is certainly up for debate. Given that Avro is still in its infancy, JDK 7 might be common by the time Avro is in production use.

        Show
        Todd Lipcon added a comment - It turns out the paper I read has been implemented in JDK 7. If someone has this mythical beast installed, it would be very interesting to see the results of Matt's benchmark code. Here's a link to someone else's experiences with it: http://lingpipe-blog.com/2009/03/30/jdk-7-twice-as-fast-as-jdk-6-for-arrays-and-arithmetic/ Whether relying on optimizations only available in a not-yet-released JVM is a good idea is certainly up for debate. Given that Avro is still in its infancy, JDK 7 might be common by the time Avro is in production use.
        Hide
        Scott Carey added a comment -

        Todd: I think that many of the JDK 7 enhancements have been backported to JDK 1.6.0_u14. I'll run some experiments later.

        Matt:
        Great stuff! Your results make sense to me based on previous experience. I went and made some modifications myself to try out doing this 4 bytes at a time.

        Unfortunately, this just made things more confusing for now.

        First, on your results:

        • 75MB/sec is somewhat slow. If anything else is roughly as expensive (say, the Avro serialization itself) then the max rate one client can encode and stream to another will be ~half that. The decode rate is good.
        • As a microbenchmark of sorts, we'll want to make sure the JVM warms up, run an iteration or two of the test, garbage collect, then measure.
        • Apple's JVM is going to be a bit off. I'll run some tests on a Linux server with Sun's JVM later, and try it with the 1.6.0_14 improvements as well.
        • There is a bug – the max interval between 0 byte occurances is 256 – which is probably why the results behaved like they did.

        I ran the same tests on my machine using Apple's 1.5 JVM with similar results. With Apple's (64 bit) 1.6 JVM, the results are much higher.

        One 0 byte per 1000 (actually less due to the bug).
        Encoding at 224.48262 MB/sec
        Decoding at 1233.1406 MB/sec

        All 0 bytes:
        Encoding at 122.69939 MB/sec
        Decoding at 62.184223 MB/sec

        one in 10 0's:
        Encoding at 143.20877 MB/sec
        Decoding at 405.06326 MB/sec

        So there is quite the potential for the latest Sun JVM to be fast ... or slow.

        I wrote a "COWSCodec" to try this out with 4 byte chunks. The initial encoding results were good ... up to 300MB/sec with all 0 bytes.
        However, that implementation uses ByteBuffer.asIntBuffer(). And those IntBuffer views do not support the .array() method, so I had to use the IntBuffer.put(IntBuffer) signature for bulk copies.
        To do that cleanly, it made most sense to refactor the whole thing to use Java nio.Buffer style method signatures (set position, limit before a copy, use mark(), flip(), etc). After doing so, it turns out that the IntBuffer views created by ByteBuffer.asIntBuffer do not really support bulk get/put operations. The max decode speed is about 420MB/sec.

        So, there is one other way to do larger chunk encodings out of a ByteBuffer source and destination – use the ByteBuffer.getInt() and raw copy stuff rather than an intermediate IntBuffer wrapper.
        I can also test out a 'real' IntBuffer which is backed by an int[] rather than a byte[] which should be the fastest – but not applicable to reading/writing from network or file.

        Both of those should be fairly simple – I'll clean up what I have, add that stuff, and put it up here in a day or two.
        Linux tests and variations with the latest/greatest JVM will be informative as well.

        Show
        Scott Carey added a comment - Todd: I think that many of the JDK 7 enhancements have been backported to JDK 1.6.0_u14. I'll run some experiments later. Matt: Great stuff! Your results make sense to me based on previous experience. I went and made some modifications myself to try out doing this 4 bytes at a time. Unfortunately, this just made things more confusing for now. First, on your results: 75MB/sec is somewhat slow. If anything else is roughly as expensive (say, the Avro serialization itself) then the max rate one client can encode and stream to another will be ~half that. The decode rate is good. As a microbenchmark of sorts, we'll want to make sure the JVM warms up, run an iteration or two of the test, garbage collect, then measure. Apple's JVM is going to be a bit off. I'll run some tests on a Linux server with Sun's JVM later, and try it with the 1.6.0_14 improvements as well. There is a bug – the max interval between 0 byte occurances is 256 – which is probably why the results behaved like they did. I ran the same tests on my machine using Apple's 1.5 JVM with similar results. With Apple's (64 bit) 1.6 JVM, the results are much higher. One 0 byte per 1000 (actually less due to the bug). Encoding at 224.48262 MB/sec Decoding at 1233.1406 MB/sec All 0 bytes: Encoding at 122.69939 MB/sec Decoding at 62.184223 MB/sec one in 10 0's: Encoding at 143.20877 MB/sec Decoding at 405.06326 MB/sec So there is quite the potential for the latest Sun JVM to be fast ... or slow. I wrote a "COWSCodec" to try this out with 4 byte chunks. The initial encoding results were good ... up to 300MB/sec with all 0 bytes. However, that implementation uses ByteBuffer.asIntBuffer(). And those IntBuffer views do not support the .array() method, so I had to use the IntBuffer.put(IntBuffer) signature for bulk copies. To do that cleanly, it made most sense to refactor the whole thing to use Java nio.Buffer style method signatures (set position, limit before a copy, use mark(), flip(), etc). After doing so, it turns out that the IntBuffer views created by ByteBuffer.asIntBuffer do not really support bulk get/put operations. The max decode speed is about 420MB/sec. So, there is one other way to do larger chunk encodings out of a ByteBuffer source and destination – use the ByteBuffer.getInt() and raw copy stuff rather than an intermediate IntBuffer wrapper. I can also test out a 'real' IntBuffer which is backed by an int[] rather than a byte[] which should be the fastest – but not applicable to reading/writing from network or file. Both of those should be fairly simple – I'll clean up what I have, add that stuff, and put it up here in a day or two. Linux tests and variations with the latest/greatest JVM will be informative as well.
        Hide
        Doug Cutting added a comment -

        I'm imagining a situation where you have part of an Avro Object container file minus the header/footer metablock because of data loss or subscribing to a data stream in "real-time" midstream.

        But metainfo is required to make sense of the stream. You need its schema, codec, etc. Getting the sync marker doesn't seem a huge burden on top of that, unless you're figuring you'd skip to the next metadata flush before you try to make sense of the stream? How critical is this streaming-without-metadata use case? If it becomes an important use case, we might define a streaming-specific container, or use RTSP or somesuch, rather than using the existing container file format at all.

        Not that this isn't an interesting area, but I'd much more interested in, e.g., gzip and lzf compression codecs for Avro's file format, or Avro InputFormat and OutputFormat's for mapreduce, or perhaps a version of Dumbo that uses the Pipes protocol to more efficiently get complex Avro data in and out of Python programs, etc.

        Show
        Doug Cutting added a comment - I'm imagining a situation where you have part of an Avro Object container file minus the header/footer metablock because of data loss or subscribing to a data stream in "real-time" midstream. But metainfo is required to make sense of the stream. You need its schema, codec, etc. Getting the sync marker doesn't seem a huge burden on top of that, unless you're figuring you'd skip to the next metadata flush before you try to make sense of the stream? How critical is this streaming-without-metadata use case? If it becomes an important use case, we might define a streaming-specific container, or use RTSP or somesuch, rather than using the existing container file format at all. Not that this isn't an interesting area, but I'd much more interested in, e.g., gzip and lzf compression codecs for Avro's file format, or Avro InputFormat and OutputFormat's for mapreduce, or perhaps a version of Dumbo that uses the Pipes protocol to more efficiently get complex Avro data in and out of Python programs, etc.
        Hide
        Scott Carey added a comment -

        Test COBS / COWS / COLS codecs. First batch of files. These three files are described as follows:

        COBSCodec2.java – minor modification of the previous version for an improved testing loop. Also modified to test in batch with the other new additions.

        COWSCodec.java – first, hack-ish version of a COBS-like encoding that works in 4 byte chunks. This version uses ByteBuffer.asIntBuffer(), and does all copies with the default nio 'copy from position() to limit()' behavior. This turns out to be slow. asIntBuffer does not have optimal copy operation as can be seen in the slow decode.

        COWSCodec2.java – re-implimented using ByteBuffer.getInt() and putInt(). Significantly faster.

        Three more files after this and a set of benchmarks on Linux with recent JRE's.

        The point of all this is experimentation and optimization. Although this specific JIRA may not become relevant – the results of this investigation may be useful in other contexts as well.

        Show
        Scott Carey added a comment - Test COBS / COWS / COLS codecs. First batch of files. These three files are described as follows: COBSCodec2.java – minor modification of the previous version for an improved testing loop. Also modified to test in batch with the other new additions. COWSCodec.java – first, hack-ish version of a COBS-like encoding that works in 4 byte chunks. This version uses ByteBuffer.asIntBuffer(), and does all copies with the default nio 'copy from position() to limit()' behavior. This turns out to be slow. asIntBuffer does not have optimal copy operation as can be seen in the slow decode. COWSCodec2.java – re-implimented using ByteBuffer.getInt() and putInt(). Significantly faster. Three more files after this and a set of benchmarks on Linux with recent JRE's. The point of all this is experimentation and optimization. Although this specific JIRA may not become relevant – the results of this investigation may be useful in other contexts as well.
        Hide
        Scott Carey added a comment -

        COWSCodec3.java – Slightly more optimized and cleaner version of COWSCodec2.
        COLSCodec.java – A version that encodes with 8 byte chunks using ByteBuffer getLong() and putLong().

        The above two have at least one minor bug left but the performance experiment should still be valid (there is a case were the decoded output can be 1 word too large). Also, these don't yet work with encoding or decoding streams that are not even multiples of 4 and 8 bytes.

        COBSPerfTest.java – a class for executing a test against all the variants in one go, with various ratios of zero words. Used for performance results that I'll post later.

        Show
        Scott Carey added a comment - COWSCodec3.java – Slightly more optimized and cleaner version of COWSCodec2. COLSCodec.java – A version that encodes with 8 byte chunks using ByteBuffer getLong() and putLong(). The above two have at least one minor bug left but the performance experiment should still be valid (there is a case were the decoded output can be 1 word too large). Also, these don't yet work with encoding or decoding streams that are not even multiples of 4 and 8 bytes. COBSPerfTest.java – a class for executing a test against all the variants in one go, with various ratios of zero words. Used for performance results that I'll post later.
        Hide
        Scott Carey added a comment -

        Performance results using COBSPerfTest on some JVM / OS / Hardware combinations.

        First, an overview:
        The 64 bit JRE on MacOS X has roughly similar performance characteristics in these tests to the Linux Sun JRE 1.6.0_12. The Mac OSX 32 bit 1.5 JRE is vastly different.
        A 32 bit JVM is slightly faster than a 64 bit JVM on the byte-by-byte work, roghly the same at 4 byte at a time work, and slower at 8 byte at a time work. This is mostly expected.
        Variations in VM from Sun 1.6.0_12 through a few early access versions of 1.6.0_14 have roghly the same performance. That is, the performance improvements in the latest JRE (of which, there are many) don't seem to have an impact here.

        Larger byte chunks help decoding only a little unless zero words dominate, and then it helps a lot.
        Larger chunks help encoding significantly across the board. COLS – working with 8 byte chunks – is about 4x faster than COBS.

        The results below could use some formatting work – it is very verbose.
        Al results with Centos 5.3
        Xeon 5335 is 2.0Ghz, 4MB cache per pair of cores, 2x quad core
        Xeon E5440 is 2.83Ghz, 6MB cache per pair of cores, 2x quad core

        Results have the following headers if you wish to search:

        • 1.6.0_13 Xeon 5335 defaults
        • 1.6.0_14b03 Xeon 5335 defaults
        • 1.6.0_14b03 Xeon 5335 compressed pointers, escape analysis
        • 1.6.0_14b06 32 bit Xeon 5335 defaults
        • 1.6.0_12 Xeon E5440 defaults

        Results:

        • 1.6.0_13 Xeon 5335 defaults
          $ /usr/java/jdk1.6.0_13/bin/java -server -Xmx512m -jar COBSPerfTest.jar
          COBSCodec, one zero word every 1 words
          Encoding at 89.13659 MB/sec
          Decoding at 57.16702 MB/sec
          COBSCodec, one zero word every 10 words
          Encoding at 69.81948 MB/sec
          Decoding at 208.78065 MB/sec
          COBSCodec, one zero word every 100 words
          Encoding at 144.56085 MB/sec
          Decoding at 925.55365 MB/sec
          COBSCodec, one zero word every 1000 words
          Encoding at 155.95493 MB/sec
          Decoding at 1033.0511 MB/sec
          COBSCodec, one zero word every 10000 words
          Encoding at 157.28098 MB/sec
          Decoding at 1038.535 MB/sec
          COWSCodec variant 1, one zero word every 1 words
          Encoding at 248.32587 MB/sec
          Decoding at 272.1762 MB/sec
          COWSCodec variant 1, one zero word every 10 words
          Encoding at 158.03842 MB/sec
          Decoding at 244.39633 MB/sec
          COWSCodec variant 1, one zero word every 100 words
          Encoding at 178.13342 MB/sec
          Decoding at 314.68652 MB/sec
          COWSCodec variant 1, one zero word every 1000 words
          Encoding at 179.34563 MB/sec
          Decoding at 319.3074 MB/sec
          COWSCodec variant 1, one zero word every 10000 words
          Encoding at 179.1614 MB/sec
          Decoding at 317.2071 MB/sec
          COWSCodec variant 1, one zero word every 100000 words
          Encoding at 179.04832 MB/sec
          Decoding at 318.77673 MB/sec
          COWSCodec variant 2, one zero word every 1 words
          Encoding at 212.54866 MB/sec
          Decoding at 239.11534 MB/sec
          COWSCodec variant 2, one zero word every 10 words
          Encoding at 225.34329 MB/sec
          Decoding at 466.0846 MB/sec
          COWSCodec variant 2, one zero word every 100 words
          Encoding at 323.6535 MB/sec
          Decoding at 1133.9556 MB/sec
          COWSCodec variant 2, one zero word every 1000 words
          Encoding at 329.2162 MB/sec
          Decoding at 1204.8298 MB/sec
          COWSCodec variant 2, one zero word every 10000 words
          Encoding at 328.59866 MB/sec
          Decoding at 1212.3876 MB/sec
          COWSCodec variant 2, one zero word every 100000 words
          Encoding at 328.2159 MB/sec
          Decoding at 1205.3972 MB/sec
          COWSCodec variant 3, one zero word every 1 words
          Encoding at 224.24924 MB/sec
          Decoding at 252.21263 MB/sec
          COWSCodec variant 3, one zero word every 10 words
          Encoding at 235.53137 MB/sec
          Decoding at 506.0203 MB/sec
          COWSCodec variant 3, one zero word every 100 words
          Encoding at 320.96054 MB/sec
          Decoding at 1094.6545 MB/sec
          COWSCodec variant 3, one zero word every 1000 words
          Encoding at 328.45444 MB/sec
          Decoding at 1213.4741 MB/sec
          COWSCodec variant 3, one zero word every 10000 words
          Encoding at 328.3331 MB/sec
          Decoding at 1231.2334 MB/sec
          COWSCodec variant 3, one zero word every 100000 words
          Encoding at 328.1387 MB/sec
          Decoding at 1217.2268 MB/sec
          COLSCodec, one zero word every 1 words
          Encoding at 291.29678 MB/sec
          Decoding at 343.89276 MB/sec
          COLSCodec, one zero word every 10 words
          Encoding at 354.09015 MB/sec
          Decoding at 812.4928 MB/sec
          Original array was modified!
          COLSCodec, one zero word every 100 words
          Encoding at 433.4998 MB/sec
          Decoding at 1204.3855 MB/sec
          COLSCodec, one zero word every 1000 words
          Encoding at 423.8553 MB/sec
          Decoding at 1237.3381 MB/sec
          COLSCodec, one zero word every 10000 words
          Encoding at 421.88364 MB/sec
          Decoding at 1238.6761 MB/sec
          COLSCodec, one zero word every 100000 words
          Encoding at 419.33118 MB/sec
          Decoding at 1239.0199 MB/sec
          COLSCodec, one zero word every 1000000 words
          Encoding at 420.57434 MB/sec
          Decoding at 1218.9686 MB/sec
        • 1.6.0_14b03 Xeon 5335 defaults
          $ java -server -Xmx512m -jar COBSPerfTest.jar
          COBSCodec, one zero word every 1 words
          Encoding at 91.95477 MB/sec
          Decoding at 57.01143 MB/sec
          COBSCodec, one zero word every 10 words
          Encoding at 73.45933 MB/sec
          Decoding at 207.67142 MB/sec
          COBSCodec, one zero word every 100 words
          Encoding at 144.0236 MB/sec
          Decoding at 913.1517 MB/sec
          COBSCodec, one zero word every 1000 words
          Encoding at 155.32053 MB/sec
          Decoding at 1032.9912 MB/sec
          COBSCodec, one zero word every 10000 words
          Encoding at 156.60835 MB/sec
          Decoding at 1024.763 MB/sec
          COWSCodec variant 1, one zero word every 1 words
          Encoding at 271.9177 MB/sec
          Decoding at 276.57822 MB/sec
          COWSCodec variant 1, one zero word every 10 words
          Encoding at 152.21716 MB/sec
          Decoding at 191.38951 MB/sec
          COWSCodec variant 1, one zero word every 100 words
          Encoding at 171.42383 MB/sec
          Decoding at 224.81892 MB/sec
          COWSCodec variant 1, one zero word every 1000 words
          Encoding at 173.3674 MB/sec
          Decoding at 228.82373 MB/sec
          COWSCodec variant 1, one zero word every 10000 words
          Encoding at 173.56622 MB/sec
          Decoding at 228.35956 MB/sec
          COWSCodec variant 1, one zero word every 100000 words
          Encoding at 173.60176 MB/sec
          Decoding at 229.12196 MB/sec
          COWSCodec variant 2, one zero word every 1 words
          Encoding at 214.48987 MB/sec
          Decoding at 241.93507 MB/sec
          COWSCodec variant 2, one zero word every 10 words
          Encoding at 244.36378 MB/sec
          Decoding at 473.0567 MB/sec
          COWSCodec variant 2, one zero word every 100 words
          Encoding at 345.88748 MB/sec
          Decoding at 1003.376 MB/sec
          COWSCodec variant 2, one zero word every 1000 words
          Encoding at 349.1008 MB/sec
          Decoding at 1026.7786 MB/sec
          COWSCodec variant 2, one zero word every 10000 words
          Encoding at 347.61612 MB/sec
          Decoding at 1028.8761 MB/sec
          COWSCodec variant 2, one zero word every 100000 words
          Encoding at 346.9563 MB/sec
          Decoding at 1061.9762 MB/sec
          COWSCodec variant 3, one zero word every 1 words
          Encoding at 210.84114 MB/sec
          Decoding at 258.27982 MB/sec
          COWSCodec variant 3, one zero word every 10 words
          Encoding at 252.59242 MB/sec
          Decoding at 507.0884 MB/sec
          COWSCodec variant 3, one zero word every 100 words
          Encoding at 353.3254 MB/sec
          Decoding at 1150.1593 MB/sec
          COWSCodec variant 3, one zero word every 1000 words
          Encoding at 358.27298 MB/sec
          Decoding at 1208.8944 MB/sec
          COWSCodec variant 3, one zero word every 10000 words
          Encoding at 357.32245 MB/sec
          Decoding at 1215.5607 MB/sec
          COWSCodec variant 3, one zero word every 100000 words
          Encoding at 356.93134 MB/sec
          Decoding at 1210.7133 MB/sec
          COLSCodec, one zero word every 1 words
          Encoding at 287.6796 MB/sec
          Decoding at 362.4284 MB/sec
          COLSCodec, one zero word every 10 words
          Encoding at 349.48486 MB/sec
          Decoding at 817.0665 MB/sec
          Original array was modified!
          COLSCodec, one zero word every 100 words
          Encoding at 418.7336 MB/sec
          Decoding at 1214.5057 MB/sec
          COLSCodec, one zero word every 1000 words
          Encoding at 410.76407 MB/sec
          Decoding at 1239.2533 MB/sec
          COLSCodec, one zero word every 10000 words
          Encoding at 408.02432 MB/sec
          Decoding at 1245.9232 MB/sec
          COLSCodec, one zero word every 100000 words
          Encoding at 406.2959 MB/sec
          Decoding at 1252.01 MB/sec
          COLSCodec, one zero word every 1000000 words
          Encoding at 405.99057 MB/sec
          Decoding at 1252.2338 MB/sec
        • 1.6.0_14b03 Xeon 5335 compressed pointers, escape analysis
          [candiru@britney COBSPerfTest]$ java -server -Xmx512m -XX:+DoEscapeAnalysis -XX:+UseCompressedOops -jar COBSPerfTest.jar
          COBSCodec, one zero word every 1 words
          Encoding at 91.98761 MB/sec
          Decoding at 53.635868 MB/sec
          COBSCodec, one zero word every 10 words
          Encoding at 72.98973 MB/sec
          Decoding at 205.35959 MB/sec
          COBSCodec, one zero word every 100 words
          Encoding at 144.04861 MB/sec
          Decoding at 918.5997 MB/sec
          COBSCodec, one zero word every 1000 words
          Encoding at 154.981 MB/sec
          Decoding at 1018.9709 MB/sec
          COBSCodec, one zero word every 10000 words
          Encoding at 156.41275 MB/sec
          Decoding at 1032.3058 MB/sec
          COWSCodec variant 1, one zero word every 1 words
          Encoding at 252.68245 MB/sec
          Decoding at 307.39664 MB/sec
          COWSCodec variant 1, one zero word every 10 words
          Encoding at 163.27182 MB/sec
          Decoding at 209.55176 MB/sec
          COWSCodec variant 1, one zero word every 100 words
          Encoding at 189.6774 MB/sec
          Decoding at 263.66977 MB/sec
          COWSCodec variant 1, one zero word every 1000 words
          Encoding at 193.37485 MB/sec
          Decoding at 270.99658 MB/sec
          COWSCodec variant 1, one zero word every 10000 words
          Encoding at 193.74573 MB/sec
          Decoding at 271.46988 MB/sec
          COWSCodec variant 1, one zero word every 100000 words
          Encoding at 194.11456 MB/sec
          Decoding at 270.73804 MB/sec
          COWSCodec variant 2, one zero word every 1 words
          Encoding at 216.82019 MB/sec
          Decoding at 243.21117 MB/sec
          COWSCodec variant 2, one zero word every 10 words
          Encoding at 242.51544 MB/sec
          Decoding at 465.20282 MB/sec
          COWSCodec variant 2, one zero word every 100 words
          Encoding at 344.99945 MB/sec
          Decoding at 1157.6014 MB/sec
          COWSCodec variant 2, one zero word every 1000 words
          Encoding at 351.1931 MB/sec
          Decoding at 1211.4054 MB/sec
          COWSCodec variant 2, one zero word every 10000 words
          Encoding at 349.90894 MB/sec
          Decoding at 1217.9989 MB/sec
          COWSCodec variant 2, one zero word every 100000 words
          Encoding at 349.40396 MB/sec
          Decoding at 1210.6339 MB/sec
          COWSCodec variant 3, one zero word every 1 words
          Encoding at 240.06367 MB/sec
          Decoding at 228.17952 MB/sec
          COWSCodec variant 3, one zero word every 10 words
          Encoding at 255.28317 MB/sec
          Decoding at 496.779 MB/sec
          COWSCodec variant 3, one zero word every 100 words
          Encoding at 360.55945 MB/sec
          Decoding at 1142.717 MB/sec
          COWSCodec variant 3, one zero word every 1000 words
          Encoding at 365.1012 MB/sec
          Decoding at 1205.8257 MB/sec
          COWSCodec variant 3, one zero word every 10000 words
          Encoding at 363.70743 MB/sec
          Decoding at 1213.5723 MB/sec
          COWSCodec variant 3, one zero word every 100000 words
          Encoding at 363.2405 MB/sec
          Decoding at 1208.7316 MB/sec
          COLSCodec, one zero word every 1 words
          Encoding at 298.33194 MB/sec
          Decoding at 318.14648 MB/sec
          COLSCodec, one zero word every 10 words
          Encoding at 368.6357 MB/sec
          Decoding at 825.8583 MB/sec
          Original array was modified!
          COLSCodec, one zero word every 100 words
          Encoding at 449.0997 MB/sec
          Decoding at 1191.9662 MB/sec
          COLSCodec, one zero word every 1000 words
          Encoding at 441.75586 MB/sec
          Decoding at 1223.806 MB/sec
          COLSCodec, one zero word every 10000 words
          Encoding at 439.18317 MB/sec
          Decoding at 1227.0127 MB/sec
          COLSCodec, one zero word every 100000 words
          Encoding at 438.62714 MB/sec
          Decoding at 1224.557 MB/sec
          COLSCodec, one zero word every 1000000 words
          Encoding at 438.62115 MB/sec
          Decoding at 1224.6772 MB/sec
        • 1.6.0_14b06 32 bit Xeon 5335 defaults
          $ /usr/java/jdk1.6.0_14ea6_32bit/bin/java -server -Xmx512m -jar COBSPerfTest.jar COBSCodec, one zero word every 1 words
          Encoding at 101.488785 MB/sec
          Decoding at 44.9381 MB/sec
          COBSCodec, one zero word every 10 words
          Encoding at 76.98102 MB/sec
          Decoding at 186.26143 MB/sec
          COBSCodec, one zero word every 100 words
          Encoding at 154.48914 MB/sec
          Decoding at 926.46204 MB/sec
          COBSCodec, one zero word every 1000 words
          Encoding at 169.65015 MB/sec
          Decoding at 996.02625 MB/sec
          COBSCodec, one zero word every 10000 words
          Encoding at 171.83167 MB/sec
          Decoding at 1069.7236 MB/sec
          COWSCodec variant 1, one zero word every 1 words
          Encoding at 229.62816 MB/sec
          Decoding at 347.4478 MB/sec
          COWSCodec variant 1, one zero word every 10 words
          Encoding at 137.4511 MB/sec
          Decoding at 181.96013 MB/sec
          COWSCodec variant 1, one zero word every 100 words
          Encoding at 170.84563 MB/sec
          Decoding at 246.61394 MB/sec
          COWSCodec variant 1, one zero word every 1000 words
          Encoding at 175.59972 MB/sec
          Decoding at 255.19583 MB/sec
          COWSCodec variant 1, one zero word every 10000 words
          Encoding at 176.94963 MB/sec
          Decoding at 257.768 MB/sec
          COWSCodec variant 1, one zero word every 100000 words
          Encoding at 175.58342 MB/sec
          Decoding at 255.8668 MB/sec
          COWSCodec variant 2, one zero word every 1 words
          Encoding at 212.1405 MB/sec
          Decoding at 257.78635 MB/sec
          COWSCodec variant 2, one zero word every 10 words
          Encoding at 231.08081 MB/sec
          Decoding at 421.9081 MB/sec
          COWSCodec variant 2, one zero word every 100 words
          Encoding at 348.02103 MB/sec
          Decoding at 1133.5847 MB/sec
          COWSCodec variant 2, one zero word every 1000 words
          Encoding at 358.29077 MB/sec
          Decoding at 1170.7545 MB/sec
          COWSCodec variant 2, one zero word every 10000 words
          Encoding at 360.5535 MB/sec
          Decoding at 1223.9012 MB/sec
          COWSCodec variant 2, one zero word every 100000 words
          Encoding at 358.03394 MB/sec
          Decoding at 1216.9368 MB/sec
          COWSCodec variant 3, one zero word every 1 words
          Encoding at 226.55222 MB/sec
          Decoding at 275.24838 MB/sec
          COWSCodec variant 3, one zero word every 10 words
          Encoding at 243.09453 MB/sec
          Decoding at 469.97775 MB/sec
          COWSCodec variant 3, one zero word every 100 words
          Encoding at 351.21555 MB/sec
          Decoding at 1129.5447 MB/sec
          COWSCodec variant 3, one zero word every 1000 words
          Encoding at 358.14252 MB/sec
          Decoding at 1196.7433 MB/sec
          COWSCodec variant 3, one zero word every 10000 words
          Encoding at 360.71323 MB/sec
          Decoding at 1199.4408 MB/sec
          COWSCodec variant 3, one zero word every 100000 words
          Encoding at 358.2802 MB/sec
          Decoding at 1224.6678 MB/sec
          COLSCodec, one zero word every 1 words
          Encoding at 208.82603 MB/sec
          Decoding at 275.9128 MB/sec
          COLSCodec, one zero word every 10 words
          Encoding at 265.03033 MB/sec
          Decoding at 730.78546 MB/sec
          Original array was modified!
          COLSCodec, one zero word every 100 words
          Encoding at 310.9054 MB/sec
          Decoding at 1157.1534 MB/sec
          COLSCodec, one zero word every 1000 words
          Encoding at 308.7317 MB/sec
          Decoding at 1238.4891 MB/sec
          COLSCodec, one zero word every 10000 words
          Encoding at 306.90793 MB/sec
          Decoding at 1220.6907 MB/sec
          COLSCodec, one zero word every 100000 words
          Encoding at 305.49704 MB/sec
          Decoding at 1205.0568 MB/sec
          COLSCodec, one zero word every 1000000 words
          Encoding at 305.3674 MB/sec
          Decoding at 1234.8855 MB/sec
        • 1.6.0_12 Xeon E5440 defaults
          $ java -server -Xmx512m -jar COBSPerfTest.jar
          COBSCodec, one zero word every 1 words
          Encoding at 124.19903 MB/sec
          Decoding at 80.51218 MB/sec
          COBSCodec, one zero word every 10 words
          Encoding at 97.80887 MB/sec
          Decoding at 293.82983 MB/sec
          COBSCodec, one zero word every 100 words
          Encoding at 203.51627 MB/sec
          Decoding at 1299.7317 MB/sec
          COBSCodec, one zero word every 1000 words
          Encoding at 219.41322 MB/sec
          Decoding at 1422.3486 MB/sec
          COBSCodec, one zero word every 10000 words
          Encoding at 220.89801 MB/sec
          Decoding at 1420.3978 MB/sec
          COWSCodec variant 1, one zero word every 1 words
          Encoding at 344.6565 MB/sec
          Decoding at 390.2233 MB/sec
          COWSCodec variant 1, one zero word every 10 words
          Encoding at 220.47774 MB/sec
          Decoding at 360.3579 MB/sec
          COWSCodec variant 1, one zero word every 100 words
          Encoding at 250.53049 MB/sec
          Decoding at 447.04602 MB/sec
          COWSCodec variant 1, one zero word every 1000 words
          Encoding at 253.66922 MB/sec
          Decoding at 450.34372 MB/sec
          COWSCodec variant 1, one zero word every 10000 words
          Encoding at 253.64081 MB/sec
          Decoding at 447.10074 MB/sec
          COWSCodec variant 1, one zero word every 100000 words
          Encoding at 252.63756 MB/sec
          Decoding at 447.50485 MB/sec
          COWSCodec variant 2, one zero word every 1 words
          Encoding at 275.47418 MB/sec
          Decoding at 332.2978 MB/sec
          COWSCodec variant 2, one zero word every 10 words
          Encoding at 316.82657 MB/sec
          Decoding at 657.1525 MB/sec
          COWSCodec variant 2, one zero word every 100 words
          Encoding at 449.77597 MB/sec
          Decoding at 1545.4358 MB/sec
          COWSCodec variant 2, one zero word every 1000 words
          Encoding at 457.52542 MB/sec
          Decoding at 1653.704 MB/sec
          COWSCodec variant 2, one zero word every 10000 words
          Encoding at 456.66467 MB/sec
          Decoding at 1658.9537 MB/sec
          COWSCodec variant 2, one zero word every 100000 words
          Encoding at 455.9669 MB/sec
          Decoding at 1655.1809 MB/sec
          COWSCodec variant 3, one zero word every 1 words
          Encoding at 315.7178 MB/sec
          Decoding at 360.02884 MB/sec
          COWSCodec variant 3, one zero word every 10 words
          Encoding at 331.1007 MB/sec
          Decoding at 723.18024 MB/sec
          COWSCodec variant 3, one zero word every 100 words
          Encoding at 443.8783 MB/sec
          Decoding at 1560.0219 MB/sec
          COWSCodec variant 3, one zero word every 1000 words
          Encoding at 447.92645 MB/sec
          Decoding at 1541.4951 MB/sec
          COWSCodec variant 3, one zero word every 10000 words
          Encoding at 449.71402 MB/sec
          Decoding at 1394.1431 MB/sec
          COWSCodec variant 3, one zero word every 100000 words
          Encoding at 441.31396 MB/sec
          Decoding at 1361.6113 MB/sec
          COLSCodec, one zero word every 1 words
          Encoding at 405.91794 MB/sec
          Decoding at 482.06976 MB/sec
          COLSCodec, one zero word every 10 words
          Encoding at 491.71738 MB/sec
          Decoding at 1079.3405 MB/sec
          Original array was modified!
          COLSCodec, one zero word every 100 words
          Encoding at 598.31836 MB/sec
          Decoding at 1616.031 MB/sec
          COLSCodec, one zero word every 1000 words
          Encoding at 586.9973 MB/sec
          Decoding at 1666.9445 MB/sec
          COLSCodec, one zero word every 10000 words
          Encoding at 585.7841 MB/sec
          Decoding at 1674.5248 MB/sec
          COLSCodec, one zero word every 100000 words
          Encoding at 585.28375 MB/sec
          Decoding at 1664.8573 MB/sec
          COLSCodec, one zero word every 1000000 words
          Encoding at 585.0993 MB/sec
          Decoding at 1662.1304 MB/sec
        Show
        Scott Carey added a comment - Performance results using COBSPerfTest on some JVM / OS / Hardware combinations. First, an overview: The 64 bit JRE on MacOS X has roughly similar performance characteristics in these tests to the Linux Sun JRE 1.6.0_12. The Mac OSX 32 bit 1.5 JRE is vastly different. A 32 bit JVM is slightly faster than a 64 bit JVM on the byte-by-byte work, roghly the same at 4 byte at a time work, and slower at 8 byte at a time work. This is mostly expected. Variations in VM from Sun 1.6.0_12 through a few early access versions of 1.6.0_14 have roghly the same performance. That is, the performance improvements in the latest JRE (of which, there are many) don't seem to have an impact here. Larger byte chunks help decoding only a little unless zero words dominate, and then it helps a lot. Larger chunks help encoding significantly across the board. COLS – working with 8 byte chunks – is about 4x faster than COBS. The results below could use some formatting work – it is very verbose. Al results with Centos 5.3 Xeon 5335 is 2.0Ghz, 4MB cache per pair of cores, 2x quad core Xeon E5440 is 2.83Ghz, 6MB cache per pair of cores, 2x quad core Results have the following headers if you wish to search: 1.6.0_13 Xeon 5335 defaults 1.6.0_14b03 Xeon 5335 defaults 1.6.0_14b03 Xeon 5335 compressed pointers, escape analysis 1.6.0_14b06 32 bit Xeon 5335 defaults 1.6.0_12 Xeon E5440 defaults Results: 1.6.0_13 Xeon 5335 defaults $ /usr/java/jdk1.6.0_13/bin/java -server -Xmx512m -jar COBSPerfTest.jar COBSCodec, one zero word every 1 words Encoding at 89.13659 MB/sec Decoding at 57.16702 MB/sec COBSCodec, one zero word every 10 words Encoding at 69.81948 MB/sec Decoding at 208.78065 MB/sec COBSCodec, one zero word every 100 words Encoding at 144.56085 MB/sec Decoding at 925.55365 MB/sec COBSCodec, one zero word every 1000 words Encoding at 155.95493 MB/sec Decoding at 1033.0511 MB/sec COBSCodec, one zero word every 10000 words Encoding at 157.28098 MB/sec Decoding at 1038.535 MB/sec COWSCodec variant 1, one zero word every 1 words Encoding at 248.32587 MB/sec Decoding at 272.1762 MB/sec COWSCodec variant 1, one zero word every 10 words Encoding at 158.03842 MB/sec Decoding at 244.39633 MB/sec COWSCodec variant 1, one zero word every 100 words Encoding at 178.13342 MB/sec Decoding at 314.68652 MB/sec COWSCodec variant 1, one zero word every 1000 words Encoding at 179.34563 MB/sec Decoding at 319.3074 MB/sec COWSCodec variant 1, one zero word every 10000 words Encoding at 179.1614 MB/sec Decoding at 317.2071 MB/sec COWSCodec variant 1, one zero word every 100000 words Encoding at 179.04832 MB/sec Decoding at 318.77673 MB/sec COWSCodec variant 2, one zero word every 1 words Encoding at 212.54866 MB/sec Decoding at 239.11534 MB/sec COWSCodec variant 2, one zero word every 10 words Encoding at 225.34329 MB/sec Decoding at 466.0846 MB/sec COWSCodec variant 2, one zero word every 100 words Encoding at 323.6535 MB/sec Decoding at 1133.9556 MB/sec COWSCodec variant 2, one zero word every 1000 words Encoding at 329.2162 MB/sec Decoding at 1204.8298 MB/sec COWSCodec variant 2, one zero word every 10000 words Encoding at 328.59866 MB/sec Decoding at 1212.3876 MB/sec COWSCodec variant 2, one zero word every 100000 words Encoding at 328.2159 MB/sec Decoding at 1205.3972 MB/sec COWSCodec variant 3, one zero word every 1 words Encoding at 224.24924 MB/sec Decoding at 252.21263 MB/sec COWSCodec variant 3, one zero word every 10 words Encoding at 235.53137 MB/sec Decoding at 506.0203 MB/sec COWSCodec variant 3, one zero word every 100 words Encoding at 320.96054 MB/sec Decoding at 1094.6545 MB/sec COWSCodec variant 3, one zero word every 1000 words Encoding at 328.45444 MB/sec Decoding at 1213.4741 MB/sec COWSCodec variant 3, one zero word every 10000 words Encoding at 328.3331 MB/sec Decoding at 1231.2334 MB/sec COWSCodec variant 3, one zero word every 100000 words Encoding at 328.1387 MB/sec Decoding at 1217.2268 MB/sec COLSCodec, one zero word every 1 words Encoding at 291.29678 MB/sec Decoding at 343.89276 MB/sec COLSCodec, one zero word every 10 words Encoding at 354.09015 MB/sec Decoding at 812.4928 MB/sec Original array was modified! COLSCodec, one zero word every 100 words Encoding at 433.4998 MB/sec Decoding at 1204.3855 MB/sec COLSCodec, one zero word every 1000 words Encoding at 423.8553 MB/sec Decoding at 1237.3381 MB/sec COLSCodec, one zero word every 10000 words Encoding at 421.88364 MB/sec Decoding at 1238.6761 MB/sec COLSCodec, one zero word every 100000 words Encoding at 419.33118 MB/sec Decoding at 1239.0199 MB/sec COLSCodec, one zero word every 1000000 words Encoding at 420.57434 MB/sec Decoding at 1218.9686 MB/sec 1.6.0_14b03 Xeon 5335 defaults $ java -server -Xmx512m -jar COBSPerfTest.jar COBSCodec, one zero word every 1 words Encoding at 91.95477 MB/sec Decoding at 57.01143 MB/sec COBSCodec, one zero word every 10 words Encoding at 73.45933 MB/sec Decoding at 207.67142 MB/sec COBSCodec, one zero word every 100 words Encoding at 144.0236 MB/sec Decoding at 913.1517 MB/sec COBSCodec, one zero word every 1000 words Encoding at 155.32053 MB/sec Decoding at 1032.9912 MB/sec COBSCodec, one zero word every 10000 words Encoding at 156.60835 MB/sec Decoding at 1024.763 MB/sec COWSCodec variant 1, one zero word every 1 words Encoding at 271.9177 MB/sec Decoding at 276.57822 MB/sec COWSCodec variant 1, one zero word every 10 words Encoding at 152.21716 MB/sec Decoding at 191.38951 MB/sec COWSCodec variant 1, one zero word every 100 words Encoding at 171.42383 MB/sec Decoding at 224.81892 MB/sec COWSCodec variant 1, one zero word every 1000 words Encoding at 173.3674 MB/sec Decoding at 228.82373 MB/sec COWSCodec variant 1, one zero word every 10000 words Encoding at 173.56622 MB/sec Decoding at 228.35956 MB/sec COWSCodec variant 1, one zero word every 100000 words Encoding at 173.60176 MB/sec Decoding at 229.12196 MB/sec COWSCodec variant 2, one zero word every 1 words Encoding at 214.48987 MB/sec Decoding at 241.93507 MB/sec COWSCodec variant 2, one zero word every 10 words Encoding at 244.36378 MB/sec Decoding at 473.0567 MB/sec COWSCodec variant 2, one zero word every 100 words Encoding at 345.88748 MB/sec Decoding at 1003.376 MB/sec COWSCodec variant 2, one zero word every 1000 words Encoding at 349.1008 MB/sec Decoding at 1026.7786 MB/sec COWSCodec variant 2, one zero word every 10000 words Encoding at 347.61612 MB/sec Decoding at 1028.8761 MB/sec COWSCodec variant 2, one zero word every 100000 words Encoding at 346.9563 MB/sec Decoding at 1061.9762 MB/sec COWSCodec variant 3, one zero word every 1 words Encoding at 210.84114 MB/sec Decoding at 258.27982 MB/sec COWSCodec variant 3, one zero word every 10 words Encoding at 252.59242 MB/sec Decoding at 507.0884 MB/sec COWSCodec variant 3, one zero word every 100 words Encoding at 353.3254 MB/sec Decoding at 1150.1593 MB/sec COWSCodec variant 3, one zero word every 1000 words Encoding at 358.27298 MB/sec Decoding at 1208.8944 MB/sec COWSCodec variant 3, one zero word every 10000 words Encoding at 357.32245 MB/sec Decoding at 1215.5607 MB/sec COWSCodec variant 3, one zero word every 100000 words Encoding at 356.93134 MB/sec Decoding at 1210.7133 MB/sec COLSCodec, one zero word every 1 words Encoding at 287.6796 MB/sec Decoding at 362.4284 MB/sec COLSCodec, one zero word every 10 words Encoding at 349.48486 MB/sec Decoding at 817.0665 MB/sec Original array was modified! COLSCodec, one zero word every 100 words Encoding at 418.7336 MB/sec Decoding at 1214.5057 MB/sec COLSCodec, one zero word every 1000 words Encoding at 410.76407 MB/sec Decoding at 1239.2533 MB/sec COLSCodec, one zero word every 10000 words Encoding at 408.02432 MB/sec Decoding at 1245.9232 MB/sec COLSCodec, one zero word every 100000 words Encoding at 406.2959 MB/sec Decoding at 1252.01 MB/sec COLSCodec, one zero word every 1000000 words Encoding at 405.99057 MB/sec Decoding at 1252.2338 MB/sec 1.6.0_14b03 Xeon 5335 compressed pointers, escape analysis [candiru@britney COBSPerfTest] $ java -server -Xmx512m -XX:+DoEscapeAnalysis -XX:+UseCompressedOops -jar COBSPerfTest.jar COBSCodec, one zero word every 1 words Encoding at 91.98761 MB/sec Decoding at 53.635868 MB/sec COBSCodec, one zero word every 10 words Encoding at 72.98973 MB/sec Decoding at 205.35959 MB/sec COBSCodec, one zero word every 100 words Encoding at 144.04861 MB/sec Decoding at 918.5997 MB/sec COBSCodec, one zero word every 1000 words Encoding at 154.981 MB/sec Decoding at 1018.9709 MB/sec COBSCodec, one zero word every 10000 words Encoding at 156.41275 MB/sec Decoding at 1032.3058 MB/sec COWSCodec variant 1, one zero word every 1 words Encoding at 252.68245 MB/sec Decoding at 307.39664 MB/sec COWSCodec variant 1, one zero word every 10 words Encoding at 163.27182 MB/sec Decoding at 209.55176 MB/sec COWSCodec variant 1, one zero word every 100 words Encoding at 189.6774 MB/sec Decoding at 263.66977 MB/sec COWSCodec variant 1, one zero word every 1000 words Encoding at 193.37485 MB/sec Decoding at 270.99658 MB/sec COWSCodec variant 1, one zero word every 10000 words Encoding at 193.74573 MB/sec Decoding at 271.46988 MB/sec COWSCodec variant 1, one zero word every 100000 words Encoding at 194.11456 MB/sec Decoding at 270.73804 MB/sec COWSCodec variant 2, one zero word every 1 words Encoding at 216.82019 MB/sec Decoding at 243.21117 MB/sec COWSCodec variant 2, one zero word every 10 words Encoding at 242.51544 MB/sec Decoding at 465.20282 MB/sec COWSCodec variant 2, one zero word every 100 words Encoding at 344.99945 MB/sec Decoding at 1157.6014 MB/sec COWSCodec variant 2, one zero word every 1000 words Encoding at 351.1931 MB/sec Decoding at 1211.4054 MB/sec COWSCodec variant 2, one zero word every 10000 words Encoding at 349.90894 MB/sec Decoding at 1217.9989 MB/sec COWSCodec variant 2, one zero word every 100000 words Encoding at 349.40396 MB/sec Decoding at 1210.6339 MB/sec COWSCodec variant 3, one zero word every 1 words Encoding at 240.06367 MB/sec Decoding at 228.17952 MB/sec COWSCodec variant 3, one zero word every 10 words Encoding at 255.28317 MB/sec Decoding at 496.779 MB/sec COWSCodec variant 3, one zero word every 100 words Encoding at 360.55945 MB/sec Decoding at 1142.717 MB/sec COWSCodec variant 3, one zero word every 1000 words Encoding at 365.1012 MB/sec Decoding at 1205.8257 MB/sec COWSCodec variant 3, one zero word every 10000 words Encoding at 363.70743 MB/sec Decoding at 1213.5723 MB/sec COWSCodec variant 3, one zero word every 100000 words Encoding at 363.2405 MB/sec Decoding at 1208.7316 MB/sec COLSCodec, one zero word every 1 words Encoding at 298.33194 MB/sec Decoding at 318.14648 MB/sec COLSCodec, one zero word every 10 words Encoding at 368.6357 MB/sec Decoding at 825.8583 MB/sec Original array was modified! COLSCodec, one zero word every 100 words Encoding at 449.0997 MB/sec Decoding at 1191.9662 MB/sec COLSCodec, one zero word every 1000 words Encoding at 441.75586 MB/sec Decoding at 1223.806 MB/sec COLSCodec, one zero word every 10000 words Encoding at 439.18317 MB/sec Decoding at 1227.0127 MB/sec COLSCodec, one zero word every 100000 words Encoding at 438.62714 MB/sec Decoding at 1224.557 MB/sec COLSCodec, one zero word every 1000000 words Encoding at 438.62115 MB/sec Decoding at 1224.6772 MB/sec 1.6.0_14b06 32 bit Xeon 5335 defaults $ /usr/java/jdk1.6.0_14ea6_32bit/bin/java -server -Xmx512m -jar COBSPerfTest.jar COBSCodec, one zero word every 1 words Encoding at 101.488785 MB/sec Decoding at 44.9381 MB/sec COBSCodec, one zero word every 10 words Encoding at 76.98102 MB/sec Decoding at 186.26143 MB/sec COBSCodec, one zero word every 100 words Encoding at 154.48914 MB/sec Decoding at 926.46204 MB/sec COBSCodec, one zero word every 1000 words Encoding at 169.65015 MB/sec Decoding at 996.02625 MB/sec COBSCodec, one zero word every 10000 words Encoding at 171.83167 MB/sec Decoding at 1069.7236 MB/sec COWSCodec variant 1, one zero word every 1 words Encoding at 229.62816 MB/sec Decoding at 347.4478 MB/sec COWSCodec variant 1, one zero word every 10 words Encoding at 137.4511 MB/sec Decoding at 181.96013 MB/sec COWSCodec variant 1, one zero word every 100 words Encoding at 170.84563 MB/sec Decoding at 246.61394 MB/sec COWSCodec variant 1, one zero word every 1000 words Encoding at 175.59972 MB/sec Decoding at 255.19583 MB/sec COWSCodec variant 1, one zero word every 10000 words Encoding at 176.94963 MB/sec Decoding at 257.768 MB/sec COWSCodec variant 1, one zero word every 100000 words Encoding at 175.58342 MB/sec Decoding at 255.8668 MB/sec COWSCodec variant 2, one zero word every 1 words Encoding at 212.1405 MB/sec Decoding at 257.78635 MB/sec COWSCodec variant 2, one zero word every 10 words Encoding at 231.08081 MB/sec Decoding at 421.9081 MB/sec COWSCodec variant 2, one zero word every 100 words Encoding at 348.02103 MB/sec Decoding at 1133.5847 MB/sec COWSCodec variant 2, one zero word every 1000 words Encoding at 358.29077 MB/sec Decoding at 1170.7545 MB/sec COWSCodec variant 2, one zero word every 10000 words Encoding at 360.5535 MB/sec Decoding at 1223.9012 MB/sec COWSCodec variant 2, one zero word every 100000 words Encoding at 358.03394 MB/sec Decoding at 1216.9368 MB/sec COWSCodec variant 3, one zero word every 1 words Encoding at 226.55222 MB/sec Decoding at 275.24838 MB/sec COWSCodec variant 3, one zero word every 10 words Encoding at 243.09453 MB/sec Decoding at 469.97775 MB/sec COWSCodec variant 3, one zero word every 100 words Encoding at 351.21555 MB/sec Decoding at 1129.5447 MB/sec COWSCodec variant 3, one zero word every 1000 words Encoding at 358.14252 MB/sec Decoding at 1196.7433 MB/sec COWSCodec variant 3, one zero word every 10000 words Encoding at 360.71323 MB/sec Decoding at 1199.4408 MB/sec COWSCodec variant 3, one zero word every 100000 words Encoding at 358.2802 MB/sec Decoding at 1224.6678 MB/sec COLSCodec, one zero word every 1 words Encoding at 208.82603 MB/sec Decoding at 275.9128 MB/sec COLSCodec, one zero word every 10 words Encoding at 265.03033 MB/sec Decoding at 730.78546 MB/sec Original array was modified! COLSCodec, one zero word every 100 words Encoding at 310.9054 MB/sec Decoding at 1157.1534 MB/sec COLSCodec, one zero word every 1000 words Encoding at 308.7317 MB/sec Decoding at 1238.4891 MB/sec COLSCodec, one zero word every 10000 words Encoding at 306.90793 MB/sec Decoding at 1220.6907 MB/sec COLSCodec, one zero word every 100000 words Encoding at 305.49704 MB/sec Decoding at 1205.0568 MB/sec COLSCodec, one zero word every 1000000 words Encoding at 305.3674 MB/sec Decoding at 1234.8855 MB/sec 1.6.0_12 Xeon E5440 defaults $ java -server -Xmx512m -jar COBSPerfTest.jar COBSCodec, one zero word every 1 words Encoding at 124.19903 MB/sec Decoding at 80.51218 MB/sec COBSCodec, one zero word every 10 words Encoding at 97.80887 MB/sec Decoding at 293.82983 MB/sec COBSCodec, one zero word every 100 words Encoding at 203.51627 MB/sec Decoding at 1299.7317 MB/sec COBSCodec, one zero word every 1000 words Encoding at 219.41322 MB/sec Decoding at 1422.3486 MB/sec COBSCodec, one zero word every 10000 words Encoding at 220.89801 MB/sec Decoding at 1420.3978 MB/sec COWSCodec variant 1, one zero word every 1 words Encoding at 344.6565 MB/sec Decoding at 390.2233 MB/sec COWSCodec variant 1, one zero word every 10 words Encoding at 220.47774 MB/sec Decoding at 360.3579 MB/sec COWSCodec variant 1, one zero word every 100 words Encoding at 250.53049 MB/sec Decoding at 447.04602 MB/sec COWSCodec variant 1, one zero word every 1000 words Encoding at 253.66922 MB/sec Decoding at 450.34372 MB/sec COWSCodec variant 1, one zero word every 10000 words Encoding at 253.64081 MB/sec Decoding at 447.10074 MB/sec COWSCodec variant 1, one zero word every 100000 words Encoding at 252.63756 MB/sec Decoding at 447.50485 MB/sec COWSCodec variant 2, one zero word every 1 words Encoding at 275.47418 MB/sec Decoding at 332.2978 MB/sec COWSCodec variant 2, one zero word every 10 words Encoding at 316.82657 MB/sec Decoding at 657.1525 MB/sec COWSCodec variant 2, one zero word every 100 words Encoding at 449.77597 MB/sec Decoding at 1545.4358 MB/sec COWSCodec variant 2, one zero word every 1000 words Encoding at 457.52542 MB/sec Decoding at 1653.704 MB/sec COWSCodec variant 2, one zero word every 10000 words Encoding at 456.66467 MB/sec Decoding at 1658.9537 MB/sec COWSCodec variant 2, one zero word every 100000 words Encoding at 455.9669 MB/sec Decoding at 1655.1809 MB/sec COWSCodec variant 3, one zero word every 1 words Encoding at 315.7178 MB/sec Decoding at 360.02884 MB/sec COWSCodec variant 3, one zero word every 10 words Encoding at 331.1007 MB/sec Decoding at 723.18024 MB/sec COWSCodec variant 3, one zero word every 100 words Encoding at 443.8783 MB/sec Decoding at 1560.0219 MB/sec COWSCodec variant 3, one zero word every 1000 words Encoding at 447.92645 MB/sec Decoding at 1541.4951 MB/sec COWSCodec variant 3, one zero word every 10000 words Encoding at 449.71402 MB/sec Decoding at 1394.1431 MB/sec COWSCodec variant 3, one zero word every 100000 words Encoding at 441.31396 MB/sec Decoding at 1361.6113 MB/sec COLSCodec, one zero word every 1 words Encoding at 405.91794 MB/sec Decoding at 482.06976 MB/sec COLSCodec, one zero word every 10 words Encoding at 491.71738 MB/sec Decoding at 1079.3405 MB/sec Original array was modified! COLSCodec, one zero word every 100 words Encoding at 598.31836 MB/sec Decoding at 1616.031 MB/sec COLSCodec, one zero word every 1000 words Encoding at 586.9973 MB/sec Decoding at 1666.9445 MB/sec COLSCodec, one zero word every 10000 words Encoding at 585.7841 MB/sec Decoding at 1674.5248 MB/sec COLSCodec, one zero word every 100000 words Encoding at 585.28375 MB/sec Decoding at 1664.8573 MB/sec COLSCodec, one zero word every 1000000 words Encoding at 585.0993 MB/sec Decoding at 1662.1304 MB/sec
        Hide
        Todd Lipcon added a comment -

        What's with this?

        COLSCodec, one zero word every 10 words
        Encoding at 354.09015 MB/sec
        Decoding at 812.4928 MB/sec
        Original array was modified!

        isn't that bad?

        Show
        Todd Lipcon added a comment - What's with this? COLSCodec, one zero word every 10 words Encoding at 354.09015 MB/sec Decoding at 812.4928 MB/sec Original array was modified! isn't that bad?
        Hide
        Scott Carey added a comment -

        COLSCodec, one zero word every 10 words
        Encoding at 354.09015 MB/sec
        Decoding at 812.4928 MB/sec
        Original array was modified!

        That Sir, is the remaining bug I alluded to but didn't highlight enough in my previous comment. If you change the size of the array, the random number seed, or just about anything else it will go away (or pop up elsewhere).

        The before and after arrays have the same bytes, but the one that was encoded and decoded has an extra word at the end. I stepped through that case briefly, but was too lazy to fix it. I don't think it is relevant to the overall results. (and any real Codec would be written cleaner, with plenty of unit tests to cover the corner cases).

        Which reminds me, these are the main conclusions I draw not specific to this JIRA:

        ByteBuffer.getInt() , getLong(), are rather optimized, as are the matching putInt() and putLong() operations. Bulk put operations are also fast on ByteBuffer, but not IntBuffer if created from ByteBuffer.asIntBuffer().

        Any encoder or decoder in Java will see potentially large performance gains if it can read / write in larger chunks.

        I could be evil and try the same test and misalign the array – start at position 1 instead of 0 (the JVM aligns array data to 8 byte boundaries, and many processor instructions are faster if aligned).

        Ok, I decided to be evil and try it on my laptop with misaligned bytes (added a put(0) to the start of the encoder and a get() to the start of the decoder, to misalign the whole thing by a byte). Now, perhaps getLong() will be a lot less efficient. Lets see:

        Aligned (COLS):
        COLSCodec, one zero word every 1 words
        Encoding at 323.87604 MB/sec
        Decoding at 419.4213 MB/sec
        COLSCodec, one zero word every 10 words
        Encoding at 376.7943 MB/sec
        Decoding at 1041.8271 MB/sec
        COLSCodec, one zero word every 10000 words
        Encoding at 439.01627 MB/sec
        Decoding at 1350.2242 MB/sec
        COLSCodec, one zero word every 1000000 words
        Encoding at 415.91876 MB/sec
        Decoding at 1411.3434 MB/sec

        Misaligned (COLS):
        COLSCodec, one zero word every 1 words
        Encoding at 327.0196 MB/sec
        Decoding at 402.65366 MB/sec
        COLSCodec, one zero word every 10 words
        Encoding at 377.48105 MB/sec
        Decoding at 974.4739 MB/se
        COLSCodec, one zero word every 10000 words
        Encoding at 445.4802 MB/sec
        Decoding at 1440.7946 MB/s
        COLSCodec, one zero word every 1000000 words
        Encoding at 443.61166 MB/sec
        Decoding at 1423.9922 MB/sec

        These are within the usual margin of error, and essentially the same. Perhaps the JVM's JIT isn't smart enough to recognize that in the first case, all access is aligned and use the processor load instructions for aligned access which are faster? I could write a COLSCodec2 that operated on LongBuffer rather than ByteBuffer to see what that does.

        But the main conclusion is that accessing in larger chunks has big gains when it is possible to do.

        Show
        Scott Carey added a comment - COLSCodec, one zero word every 10 words Encoding at 354.09015 MB/sec Decoding at 812.4928 MB/sec Original array was modified! That Sir, is the remaining bug I alluded to but didn't highlight enough in my previous comment. If you change the size of the array, the random number seed, or just about anything else it will go away (or pop up elsewhere). The before and after arrays have the same bytes, but the one that was encoded and decoded has an extra word at the end. I stepped through that case briefly, but was too lazy to fix it. I don't think it is relevant to the overall results. (and any real Codec would be written cleaner, with plenty of unit tests to cover the corner cases). Which reminds me, these are the main conclusions I draw not specific to this JIRA: ByteBuffer.getInt() , getLong(), are rather optimized, as are the matching putInt() and putLong() operations. Bulk put operations are also fast on ByteBuffer, but not IntBuffer if created from ByteBuffer.asIntBuffer(). Any encoder or decoder in Java will see potentially large performance gains if it can read / write in larger chunks. I could be evil and try the same test and misalign the array – start at position 1 instead of 0 (the JVM aligns array data to 8 byte boundaries, and many processor instructions are faster if aligned). Ok, I decided to be evil and try it on my laptop with misaligned bytes (added a put(0) to the start of the encoder and a get() to the start of the decoder, to misalign the whole thing by a byte). Now, perhaps getLong() will be a lot less efficient. Lets see: Aligned (COLS): COLSCodec, one zero word every 1 words Encoding at 323.87604 MB/sec Decoding at 419.4213 MB/sec COLSCodec, one zero word every 10 words Encoding at 376.7943 MB/sec Decoding at 1041.8271 MB/sec COLSCodec, one zero word every 10000 words Encoding at 439.01627 MB/sec Decoding at 1350.2242 MB/sec COLSCodec, one zero word every 1000000 words Encoding at 415.91876 MB/sec Decoding at 1411.3434 MB/sec Misaligned (COLS): COLSCodec, one zero word every 1 words Encoding at 327.0196 MB/sec Decoding at 402.65366 MB/sec COLSCodec, one zero word every 10 words Encoding at 377.48105 MB/sec Decoding at 974.4739 MB/se COLSCodec, one zero word every 10000 words Encoding at 445.4802 MB/sec Decoding at 1440.7946 MB/s COLSCodec, one zero word every 1000000 words Encoding at 443.61166 MB/sec Decoding at 1423.9922 MB/sec These are within the usual margin of error, and essentially the same. Perhaps the JVM's JIT isn't smart enough to recognize that in the first case, all access is aligned and use the processor load instructions for aligned access which are faster? I could write a COLSCodec2 that operated on LongBuffer rather than ByteBuffer to see what that does. But the main conclusion is that accessing in larger chunks has big gains when it is possible to do.
        Hide
        Scott Carey added a comment -

        So, aligned access is important – However, the JVM 's JIT can not guarantee it on a ByteBuffer or byte[], but can on a LongBuffer or long[]. Here are results on my laptop akin to the above, but with a COLSCodec2 that uses a LongBuffer rather than a ByteBuffer + getLong()/putLong().

        COLSCodec, one zero word every 1 words
        Encoding at 939.8201 MB/sec
        Decoding at 980.54034 MB/sec
        COLSCodec, one zero word every 10 words
        Encoding at 822.7025 MB/sec
        Decoding at 1188.7073 MB/sec
        COLSCodec, one zero word every 1000 words
        Encoding at 1104.4512 MB/sec
        Decoding at 1429.9589 MB/sec

        Unfortunately, for anything reading/writing from the network or a file, byte streams and arrays are the only option. And as demonstrated before, asLongBuffer or asIntBuffer is not optimized and fairly restrictive. This seems to indicate that in the future, there is more that the JVM can do, or there are Java APIs that could be made so that the JIT can easily detect data alignment and be more efficient.

        Show
        Scott Carey added a comment - So, aligned access is important – However, the JVM 's JIT can not guarantee it on a ByteBuffer or byte[], but can on a LongBuffer or long[]. Here are results on my laptop akin to the above, but with a COLSCodec2 that uses a LongBuffer rather than a ByteBuffer + getLong()/putLong(). COLSCodec, one zero word every 1 words Encoding at 939.8201 MB/sec Decoding at 980.54034 MB/sec COLSCodec, one zero word every 10 words Encoding at 822.7025 MB/sec Decoding at 1188.7073 MB/sec COLSCodec, one zero word every 1000 words Encoding at 1104.4512 MB/sec Decoding at 1429.9589 MB/sec Unfortunately, for anything reading/writing from the network or a file, byte streams and arrays are the only option. And as demonstrated before, asLongBuffer or asIntBuffer is not optimized and fairly restrictive. This seems to indicate that in the future, there is more that the JVM can do, or there are Java APIs that could be made so that the JIT can easily detect data alignment and be more efficient.
        Hide
        Doug Cutting added a comment -

        I don't think adding this is worthwhile pursuing at this point. While having nice properties, this inserts a non-negligible decoding operation to all data file processing.

        We can potentially add this to a future file format, but for the file format specfied in the 1.0 release I'd like to keep it as-is. Objections?

        Show
        Doug Cutting added a comment - I don't think adding this is worthwhile pursuing at this point. While having nice properties, this inserts a non-negligible decoding operation to all data file processing. We can potentially add this to a future file format, but for the file format specfied in the 1.0 release I'd like to keep it as-is. Objections?
        Hide
        Scott Carey added a comment -

        I agree, COBS-like encoding is only useful for streaming data where a specific character or word must be avoided which is a format issue.

        If all that is needed is identifying block boundaries, there are other methods.

        A "magic number" approach can be collision proof by detecting the collision: On encode, look for the magic number and if present, follow it with a 'not at the end of the block' word; at the end of the block place the magic number and a 'end of block' word. On decode look for the magic number and discard the following word, if the following word is the end of block word also discard the magic word. COBS came about because the worst case scenario for a magic word approach is poor, and if the size of the magic word is small (one byte) the worst case is likely.

        This might prove very useful at some point. Some of the general optimization findings here will be useful somewhere.

        Show
        Scott Carey added a comment - I agree, COBS-like encoding is only useful for streaming data where a specific character or word must be avoided which is a format issue. If all that is needed is identifying block boundaries, there are other methods. A "magic number" approach can be collision proof by detecting the collision: On encode, look for the magic number and if present, follow it with a 'not at the end of the block' word; at the end of the block place the magic number and a 'end of block' word. On decode look for the magic number and discard the following word, if the following word is the end of block word also discard the magic word. COBS came about because the worst case scenario for a magic word approach is poor, and if the size of the magic word is small (one byte) the worst case is likely. This might prove very useful at some point. Some of the general optimization findings here will be useful somewhere.
        Hide
        Doug Cutting added a comment -

        Resolving this as a something we may implement later.

        Show
        Doug Cutting added a comment - Resolving this as a something we may implement later.

          People

          • Assignee:
            Unassigned
            Reporter:
            Matt Massie
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development