diff --git oak-doc/src/site/markdown/nodestore/overview.md oak-doc/src/site/markdown/nodestore/overview.md
index 08788a5..eacd52c 100644
--- oak-doc/src/site/markdown/nodestore/overview.md
+++ oak-doc/src/site/markdown/nodestore/overview.md
@@ -17,9 +17,10 @@
# Node Storage
-Oak comes with two node storage flavours: [Segment](segmentmk.html) and [Document](documentmk.html).
-Segment storage is optimised for maximal performance in standalone deployments,
-and document storage is optimised for maximal scalability in clustered deployments.
+Oak comes with two node storage flavours: [Segment](segment/overview.html) and
+[Document](documentmk.html). Segment storage is optimised for maximal
+performance in standalone deployments, and document storage is optimised for
+maximal scalability in clustered deployments.
## NodeStore API
@@ -32,4 +33,4 @@ which is suited to work with in Java, and has lower performance and memory overh
## MicroKernel API
The `MicroKernel` API was deprecated in OAK 1.2 and dropped from the project as of
-Oak 1.3.0. It used to exposes its functionality through a String only (JSON) API.
\ No newline at end of file
+Oak 1.3.0. It used to exposes its functionality through a String only (JSON) API.
diff --git oak-doc/src/site/markdown/nodestore/segment/overview.md oak-doc/src/site/markdown/nodestore/segment/overview.md
new file mode 100644
index 0000000..3f33ea5
--- /dev/null
+++ oak-doc/src/site/markdown/nodestore/segment/overview.md
@@ -0,0 +1,39 @@
+
+
+# Segment Node Store
+
+The Segment Node Store is the implementation of the [Node
+Store](../overview.html) that persists repository data on the file
+system in an efficient, organized and performant way.
+
+One of the most important tasks of the Segment Node Store is the management of a
+set of TAR files. These files are the most coarse-grained containers for the
+repository data. You can learn more about the general structure of a TAR file
+and how Oak leverages TAR files by reading [this page](tar.html).
+
+Every TAR file contains segments, finer-grained containers of repository data.
+Unsurprisingly, segments inspired the name of this Node Store implementation.
+Repository nodes and properties are serialized to one or more records, and these
+records are saved into the segments. You can learn about the internal
+organization of segments and the different ways to serialize records by reading
+[this page](records.html).
+
+This website also contain a broader overview of the Segment Store and of the
+design desictions that brought to his implementation. The page is quite old and
+potentially outdated, but contains valuable information and is accessible
+[here](../segmentmk.html).
diff --git oak-doc/src/site/markdown/nodestore/segment/records.md oak-doc/src/site/markdown/nodestore/segment/records.md
new file mode 100644
index 0000000..28e12ab
--- /dev/null
+++ oak-doc/src/site/markdown/nodestore/segment/records.md
@@ -0,0 +1,291 @@
+
+
+# Records and segments
+
+While TAR files and segments are a coarse-grained mechanism to divide the
+repository content in more manageable pieces, the real information is stored
+inside the segments as finer-grained records. Here I zoom in the segments and
+show the binary representation of data stored by Oak in the segments. It is not
+strictly necessary to know how segments work in order to understand this
+content, but if you feel lost you can refer to [this description of the
+structure of TAR files](tar.html).
+
+## Data and bulk segments
+
+Segments are not created equal. Oak, in fact, distinguishes data and bulk
+segments, where the former is used to store structured data (e.g. information
+about node and properties), while the latter contains unstructured data (e.g.
+the value of binary properties or of very long strings).
+
+It is possible to take apart a bulk segment from a data segment by just looking
+at its identifier. As explained in a previous post, a segment identifier is a
+randomly generated UUID. Segment identifiers are 16 bytes long, but Oak uses 4
+bits to store a flag capable to set apart bulk segments from data segments.
+
+The most interesting kind of segment is the data segment, because it stores
+information about the repository in a structured and easily accessible way.
+
+## Overview of data segments
+
+A data segment can be roughly divided in two parts, a header and a data section.
+The header contains management information about the segment itself, while the
+data section stores the actual repository data.
+
+Repository data is split into records, that are tiny bits of information that
+represent different types of information. There are different types of records,
+where every type is specialized in storing a specific piece of information: node
+records, template records, map records, list records, and so on.
+
+In general, a record can be considered as a contiguous sequence of bytes stored
+at a specific position inside a segment. A record can also have references to
+other records, where the referenced records can be stored in the same segment or
+not. Since records can reference each other, a segment actually stores a graph
+of records, where the implementation guarantees that the graph is acyclic.
+
+The segment also maintains a set of references to *root records* those records
+in the graph that are not referenced by any other records. In graph jargon,
+these records would be called source vertices. The set of references to root
+records is stored in the header section of the segment.
+
+## Record identifiers
+
+Records need a mechanism to reference each other, both from inside the same
+segment and across different segments. The mechanism used to reference a record
+is (unsurprisingly) a record identifier.
+
+A record identifier is composed of a *segment field* and a *position field*. The
+segment field is a single byte that identifies the segment where the referenced
+record is stored. The position field is the position of the record inside the
+segment identified by the segment field. There are some peculiarities in both
+the segment and the position field that may not be immediately obvious. The
+picture below shows how a segment looks like.
+
+
+
+The segment field is just one byte long, but a segment identifier is 16 bytes
+long. To bridge the gap, the segment header contains an array of segment
+identifiers that is used as a look-up table. The array can store only 255
+segment identifiers, so a single byte is enough to access every element in the
+array. In fact, the segment field in a record identifier is just an index in the
+array of segment identifiers that is used as a look-up table. The look-up table
+always contains the segment identifier of the current segment in the first
+position: if a segment field is set to zero, the referenced record is stored in
+the current segment.
+
+The definition of the position field relies on some important properties of data
+segments:
+
+- data segments have a maximum size of 256 KiB, or `0x40000` bytes. The size of
+ a data segment can never exceed this limit, but it is perfectly legal to have
+ data segments smaller than 256 KiB.
+
+- records are always aligned on a two-bit boundaries. Stated differently, when a
+ record is written in a segment, it must be stored at a position that is a
+ multiple of four.
+
+- records are stored from the end of the segment. Even if this may seem
+ counterintuitive, it makes perfectly sense if you consider that new records
+ are written as a consequence of an in-depth traversal of a content tree.
+ Writing records from the end of the segment guarantees that the records that
+ are relevant to the root of the content tree are at the beginning of the
+ segment, while records that are relevant to the leaves of the content tree are
+ stored at the end. This makes reading from the segment faster, because
+ operating systems are optimized to read files from the beginning to the end,
+ and not backwards.
+
+So, according to the the previous properties, allowed positions range from
+`0x40000` (not included) to zero (included). Moreover, assigned positions must
+be multiples of four.
+
+```
+0x3FFFC, 0x3FFF8, 0x3FFF4, 0x3FFF0, 0x3FFEC, ..., 0x0
+```
+
+As you can see, three bytes would be necessary to store these positions, but we
+know that a record identifier uses only two bytes to store position values. This
+is possible because of a very simple optimization made to the positions before
+being used in a record identifier. Since the positions are multiples of four,
+the last two least significant bits are always zero. Being constant, these bits
+can be removed by shifting the positions to the right twice. After the shift,
+the list of possible positions become
+
+```
+0xFFFF, 0xFFFE, 0xFFFD, 0xFFFC, 0xFFFB, ..., 0x0
+```
+
+With this optimization, only two bytes are necessary to store a position inside
+a record identifier. Of course, when you read a position from a record
+identifier you have to remember to shift the position to left twice to obtain a
+valid position.
+
+The last important piece of information about positions is that they are always
+assigned on a logic segment size of `0x40000` bytes, even if the segment ends up
+to be smaller than 256 KiB. This doesn't mean that these absolute positions are
+useless: in fact, they are converted to offsets relative to the effective end of
+the segment.
+
+Hopefully an example will clarify this.
+
+Let's suppose that you are reading from a segment whose size is just 128 bytes.
+Let's also suppose that you want to read a record from the position `0xFFF8`.
+The problem is that the position `0xFFF8` was computed on a logical segment size
+of 256 KiB (or `0x40000` bytes), so you have to convert the position `0xFFF8`
+into an offset that can be used with the segment you are reading from. First of
+all, the two least significant bits were stripped away from the position, so you
+have to rotate `0xFFF8` two places to the left to obtain a proper position of
+`0xFFF8 << 2 = 0x3FFE0`. How far was this position from the logical size of
+`0x40000` bytes? It is easy to compute that the referenced record is `0x40000 -
+0x3FFE0 = 0x20 = 32` bytes before the end of the segment.
+
+Given that the size of the current segment is only 128 bytes, you can use an
+absolute position of `128 - 32 = 96` in the segment to read the record you are
+interested in.
+
+## Record types
+
+As stated before, there are many types of records. It is necessary to make a
+distinction between logical and physical records, where the former are an
+idealized representation of a data structure, and the latter are used to encode
+the data structures in the segments as sequence of bits.
+
+Usually there is a one-to-one mapping between logical and physical records, like
+in block records, value records, template records and node records. Other types
+of logical record, like map records and list records, use more than one physical
+record to represent the content.
+
+Let's give a brief description of the aforementioned records.
+
+### Block records
+
+A block record is the simplest form of record, because it is just a plain
+sequence of bytes. It doesn't even contain a length: it is up to the writer of
+this record to store the length elsewhere.
+
+The only adjustment performed to the data is the alignment. The implementation
+makes sure that the written sequence of bytes is stored at a position that is a
+multiple of four.
+
+### Value records
+
+Value records are an improvement over block records, because they give the
+possibility to store arbitrary binary data with an additional length and
+optional references to other records.
+
+The implementation represents value records in different ways, depending on the
+length of the data to be written. If the data is short enough, the record can be
+written in the simplest way possible: a lento field and the data inlined
+directly in the record.
+
+When the data is too big, instead, it is split into block records written into
+block segments. The reference to these block records are stored into a list
+record, whose identifier is stored inside the value record.
+
+This means that value record represent a good compromise when writing binary or
+string data. If the data is short enough, it is written in such a way that can
+be used straight away without further reads in the segment. If the data is too
+long, instead, it is stored separated from the repository content not to impact
+the performance of the readers of the segment.
+
+### List records
+
+List records are a general-purpose list of record identifiers. They are used as
+building blocks for other types of records, as we saw for value records and as
+we will see for template records and node records.
+
+The list record is a logical record using two different types of physical
+records to represent itself:
+
+- bucket record: this is a recursive record representing a list of at most 255
+ references. A bucket record can reference other bucket records,
+ hierarchically, or the record identifiers of the elements to be stored in the
+ list. A bucket record doesn't maintain any other information exception record
+ identifiers.
+
+- list record: this is a top-level record that maintains the size of the list in
+ an integer field and a record identifier pointing to a bucket.
+
+List records are useful to store a list of references to other records. If the
+list is too big, it is split into different bucket records that may be stored
+in the same segment or across segments. This guarantees good performance for
+small lists, without loosing the capability to store lists with a big number of
+elements.
+
+### Map records
+
+Map records are a general-purpose maps of strings to record identifiers. As
+lists, they are used as building blocks for other types of records and are
+represented using two types of physical record:
+
+- leaf record: if the number of elements in the map is small, they are all
+ stored in a leaf record. This covers the simplest case for small maps.
+
+- branch record: if the number of elements in the map is too big, the original
+ map is split into smaller maps based on a hash function applied to the keys of
+ the map. A branch record is recursive, because it can reference other branch
+ records if the sub-maps are too big and need to be split again.
+
+The implementation of the map record relies on the properties defined by an
+external data structure called HAMT (Hash Array Mapped Trie), capable of
+combining the properties of hash table and a trie.
+
+Map records are also optimized for small changes. In example, if only one
+element of a previously stored map is modified, and the map is stored again,
+only a "diff" of the map is stored. This prevents the full storage of the
+modified map, which can save a considerable amount of space if the original map
+was big.
+
+### Template records
+
+A template record stores metadata about nodes that, on average, don't change so
+often. A template record stores information like the primary type, the mixin
+types, the property names and the property types of a node. Having this
+information stored away from the node itself prevents to write them over and
+over again if they don't change when the node changes.
+
+In example, on average, a node is created with a certain primary type and,
+optionally, with some mixin types. Usually, because of its primary type, a node
+is already created with a set of initial properties. After that, only the value
+of the properties change, but not the structure of the node itself.
+
+The template record allows Oak to handle simple modifications to nodes in the
+most efficient way possible.
+
+### Node records
+
+The node record is the single most important type of record, capable of storing
+both the data associated to the node and the structure of the content tree.
+
+A node record always maintain a reference to a template record. As stated
+before, a template record defines the overall structure of the node, while the
+variable part of it is maintained in the node record itself.
+
+The variable part of the node is represented by a list of property values and a
+map of child nodes.
+
+The list of property values is implemented as a list of record identifiers. For
+each property in the node, its value is written in the segment. The record
+identifiers referencing the values of the properties are then packed together in
+a list record. The identifier of the list record is stored as part of the node
+record. If the value of some properties didn't change, the previous record
+identifier is just reused.
+
+The map of child nodes is implemented as a map of record identifiers. For every
+child node, its node record identifier is stored in a map indexed by name. The
+map is persisted in a map record, and its identifier is stored in the node
+record. Thanks to the optimizations implemented by the map record, small changes
+to the map of children node don't create a lot of overhead in the segment.
diff --git oak-doc/src/site/markdown/nodestore/segment/segment.png oak-doc/src/site/markdown/nodestore/segment/segment.png
new file mode 100644
index 0000000..ae7b463
Binary files /dev/null and oak-doc/src/site/markdown/nodestore/segment/segment.png differ
diff --git oak-doc/src/site/markdown/nodestore/segment/tar.md oak-doc/src/site/markdown/nodestore/segment/tar.md
new file mode 100644
index 0000000..1060342
--- /dev/null
+++ oak-doc/src/site/markdown/nodestore/segment/tar.md
@@ -0,0 +1,210 @@
+
+
+# Structure of TAR files
+
+Here is described the phisical layout of a TAR file as used by Apache Oak.
+First, a brief introduction of the TAR format is given. Next, more details are
+provided about the low level information that are written in TAR entries.
+Finally, it's described how Oak saves a graph data structure inside the TAR file
+and how this representation is optimized for fast retrieval.
+
+## Organization of a TAR file
+
+Phisically speaking, a TAR file is a linear sequence of blocks. A TAR file is
+terminated by two blocks containing zero bytes. Every block has a size of 512
+bytes.
+
+Logically speaking, a TAR file is a linear sequence of entries. Every entry is
+represented by two or more blocks. The first block always contains the entry
+header. Subsequent blocks store the content of the file.
+
+The entry header is composed of the following fields:
+
+- file name (100 bytes) - name of the file stored in this entry.
+
+- file mode (8 bytes) - string representation of the octal file mode.
+
+- owner's numeric ID (8 bytes) - string representation of the user ID of the
+ owner of the file.
+
+- group's numeric ID (8 bytes) - string representation of the group ID of the
+ owner of the file.
+
+- file size (12 bytes) - string representation of the octal size of the file.
+
+- last modification time (12 bytes) - string representation of the octal time
+ stamp when the file was last modified.
+
+- checksum (8 bytes) - checksum for the header data.
+
+- file type (1 byte) - type of the file stored in the entry. This field
+ specifies if the file is a regular file, a hard link or a symbolic link.
+
+- name of linked file (1 byte) - in case the file stored in the entry is a link,
+ this field stores the name of the file pointed to by the link.
+
+## The TAR file as used by Oak
+
+Some fields are not used by Oak. In particular, Oak sets the file mode, the
+owner's numeric ID, the group's numeric ID, the checksum, and the name of linked
+file to uninteresting values. The only meaningful values assigned to the fields
+of the entry values are:
+
+- file name: the name of the data file. There are different data files used by
+ Oak. They are described below.
+
+- file size: the size of the data file. The value assigned to this field is
+ trivially computed from the amount of information stored in the data file.
+
+- last modification time: the time stamp when the entry was written.
+
+There are three kind of files stored in a TAR file:
+
+- segments: this type of file contains data about a segment in the segment
+ store. This kind of file has a file name in the form `UUID.CRC2`, where `UUID`
+ is a 128 bit UUID represented as an hexadecimal string and `CRC2` is a zer-
+ padded numeric string representing the CRC2 checksum of the raw segment data.
+
+- graph: this file has a name ending in `.gph` and contains a representaion of a
+ graph. The graph is represented as an adjacency list of UUIDs.
+
+- index: this file has a name ending in `.idx` and contains a sorted list of
+ every segment contained in the TAR file.
+
+The layout of the TAR file used by Oak is engineered for perfomance of read
+operations. In particular, the most important information is stored in the
+bottom entries. Reading the entries from the bottom of the file, you encounter
+first the index, then the graph, then segment files. The idea is that the index
+must be read first, because it provides a fast tool to locate segments in the
+rest of the file. Next comes the graph, that describes how segments relate to
+each other. Last come the segments, whose relative order can be ignored.
+
+At the same time, the layout of the TAR file allows fast append-only operations
+when writing. Since the relative order of segment files is not important,
+segment entries can be written in a first come, first served basis. The index at
+the end of the file will provide a fast way to access them even if they are
+scattered around the file.
+
+## Segment files
+
+Segment files contain raw data about a segment. Even if there are multiple kinds
+of segments, TAR file only distinguishes between data and non-data segments. A
+non-data segment is always saved as-is in the TAR file, without further
+processing. A data segment, instead, is inspected to extract references to other
+segments.
+
+A data segment can contain at most 255 references to other segments. These
+references are simply stored as a list of UUIDs. The referenced segments can be
+stored inside the current TAR file or outside of it. In the first case, the
+referenced segment can be found by inspecting the index. In the second case, an
+external agent is responsible to find the segment in another TAR file.
+
+The list of segments referenced by a data segment will end up in the graph file.
+To speed up the process of locating a segment in the list of referenced segment,
+this list is maintained ordered.
+
+## Graph files
+
+The graph file represents the relationships between segments stored inside or
+outside the TAR file. The graph is stored as an adjacency list of UUID, where
+each UUID represents a segment.
+
+The format of the graph file is optimized for reading. The graph file is stored
+in reverse order to maintain the most important information at the end of the
+file. This strategy is inline with the overall layout of the entries in the TAR
+file.
+
+The content of the graph file is divided in three main parts: a graph header, a
+graph adjacency list and a vertex mapping table.
+
+The graph header contains the following fields:
+
+- a magic number (4 bytes): identifies the beginning of a graph file.
+
+- size of the graph adjacency list (4 bytes): number of bytes occupied by the
+ graph adjacency list.
+
+- size of the vertex mapping table (4 bytes): number of bytes occupied by the
+ vertex mapping table.
+
+- checksum (4 bytes): a CRC2 checksum of the content of the graph file.
+
+Immediately after the graph header, the graph adjacency list is stored. In the
+list, each vertex is represented by an integer. Each integer represents an index
+in the vertex mapping table. For each vertex stored in the adjacency list, the
+following information are written:
+
+- the integer representing the current vertex.
+
+- zero or more integers for each vertex referenced by the current one.
+
+- a sentinel value representing the list of adjacent vertices (-1).
+
+At the end, the vertex mapping table is stored. This table is just an ordered
+list of UUIDs. The integers used in the graph adjacency list can be used as
+index in the vertext mapping table to read the UUID of the corresponding
+segment. This is a space optimization. Since UUIDs can be repeated more than
+once in the adjacency list, it make sense to replace them with cheaper
+placeholders. A UUID is 128 bit long, while an integer just 4.
+
+## Index files
+
+The index file is an ordered list of references to the entries contained in the
+TAR file. The references are ordered by UUID and they point to the position in
+the file where the entry is stored. Like the graph file, even the index file is
+stored backwards.
+
+The index file is divided in two parts. The first is an index header, the second
+contains the real data about the index.
+
+The index data contains the following fields:
+
+- a magic number (4 bytes): identifies the beginning of an index file.
+
+- size fo the index (4 bytes): number of bytes occupied by the index data. This
+ size also contains padding bytes that are added to the index to make it align
+ with the TAR block boundary.
+
+- number of entries (4 bytes): how many entries the index contains.
+
+- checksum (4 bytes): a CRC32 checksum of the content of the index file.
+
+After the header, the content of the index starts. For every entry contained in
+the index, the following information are stored:
+
+- the most significat bits of the UUID (8 bytes)
+
+- the least significat bits of the UUID (8 bytes)
+
+- the offset in the TAR file where the TAR entry containing the segment is
+ located.
+
+- the size of the entry in the TAR file.
+
+Since the entries in the index are sorted by UUID, and since the UUIDs assigned
+to the entries are uniformly distributed, when searching an entry by its UUID an
+efficient algorithm called interpolation search can be used. This algorithm is a
+variation of binary search. While in binary search the search space (in this
+case, the array of entry) is halved at every iteration, interpolation search
+exploits the distribution of the keys to remove a portion of the search space
+that is potentially bigger than the half of it. Interpolation search is a more
+natural approximation of the way a person searches in a phone book. If the name
+to search begins with the letter T, in example, it makes no sense to open the
+phone book at the half. It is way more efficient, instead, to open the phone
+book close to the bottom quarter, since names starting with the letter T are
+more likely to be distributed in that part of the phone book.
diff --git oak-doc/src/site/site.xml oak-doc/src/site/site.xml
index 971fdf2..71fab55 100644
--- oak-doc/src/site/site.xml
+++ oak-doc/src/site/site.xml
@@ -41,7 +41,7 @@ under the License.
-
+