XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Later
Affects Version/s: None
Fix Version/s: None
Component/s: spec
Labels:
None

Description

Object Container Files could use a 1 byte sync marker (set to zero) using zig-zag and COBS encoding within blocks to efficiently escape zeros from the record data.

Zig-Zag encoding

With zig-zag encoding only the value of 0 (zero) gets encoded into a value with a single zero byte. This property means that we can write any non-zero zig-zag long inside a block within concern for creating an unintentional sync byte.

COBS encoding

We'll use COBS encoding to ensure that all zeros are escaped inside the block payload. You can read http://www.sigcomm.org/sigcomm97/papers/p062.pdf for the details about COBS encoding.

Block Format

All blocks start and end with a sync byte (set to zero) with a type-length-value format internally as follows:

name	format	length in bytes	value	meaning
sync	byte	1	always 0 (zero)	The sync byte serves as a clear marker for the start of a block
type	zig-zag long	variable	must be non-zero	The type field expresses whether the block is for metadata or normal data.
length	zig-zag long	variable	must be non-zero	The length field expresses the number of bytes until the next record (including the cobs code and sync byte). Useful for skipping ahead to the next block.
cobs_code	byte	1	see COBS code table below	Used in escaping zeros from the block payload
payload	cobs-encoded	Greater than or equal to zero	all non-zero bytes	The payload of the block
sync	byte	1	always 0 (zero)	The sync byte serves as a clear marker for the end of the block

COBS code table

Code	Followed by	Meaning
0x00	(not applicable)	(not allowed )
0x01	nothing	Empty payload followed by the closing sync byte
0x02	one data byte	The single data byte, followed by the closing sync byte
0x03	two data bytes	The pair of data bytes, followed by the closing sync byte
0x04	three data bytes	The three data bytes, followed by the closing sync byte
n	(n-1) data bytes	The (n-1) data bytes, followed by the closing sync byte
0xFD	252 data bytes	The 252 data bytes, followed by the closing sync byte
0xFE	253 data bytes	The 253 data bytes, followed by the closing sync byte
0xFF	254 data bytes	The 254 data bytes not followed by a zero.

(taken from http://www.sigcomm.org/sigcomm97/papers/p062.pdf)

Encoding

Only the block writer needs to perform byte-by-byte processing to encode the block. The overhead for COBS encoding is very small in terms of the in-memory state required.

Decoding

Block readers are not required to do as much byte-by-byte processing as a writer. The reader could (for example) find a metadata block by doing the following:

Search for a zero byte in the file which marks the start of a record
Read and zig-zag decode the type of the block
- If the block is normal data, read the length, seek ahead to the next block and goto step #2 again
- If the block is a metadata block, cobs decode the data

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

COBSCodec.java
08/May/09 19:08
8 kB
Matt Massie
COBSCodec2.java
26/May/09 04:30
10 kB
Scott Carey
COBSPerfTest.java
26/May/09 04:35
0.6 kB
Scott Carey
COLSCodec.java
26/May/09 04:35
8 kB
Scott Carey
COLSCodec2.java
26/May/09 06:16
8 kB
Scott Carey
COWSCodec.java
26/May/09 04:30
9 kB
Scott Carey
COWSCodec2.java
26/May/09 04:30
8 kB
Scott Carey
COWSCodec3.java
26/May/09 04:35
8 kB
Scott Carey

Activity

People

Assignee:: Unassigned

Reporter:: Matt Massie

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 06/May/09 20:11

Updated:: 14/Jul/09 23:04

Resolved:: 23/Jun/09 20:02