Index: oak-doc/src/site/markdown/nodestore/segment/changes.md =================================================================== --- oak-doc/src/site/markdown/nodestore/segment/changes.md (nonexistent) +++ oak-doc/src/site/markdown/nodestore/segment/changes.md (working copy) @@ -0,0 +1,85 @@ + + +# Changes in the data format + +This document describes the changes in the data format introduced by the Oak Segment Tar module. +The purpose of this document is not only to enumerate such changes, but also to explain the rationale behind them. +Pointers to Jira issues are provided for a much more terse description of changes. +Changes are presented in chronological order. + +## Generation in segment headers + +* Jira issue: [OAK-3348](https://issues.apache.org/jira/browse/OAK-3348) +* Since: Oak Segment Tar 0.0.2 + +The GC algorithm implemented by Oak Segment Tar is based on the fundamental idea of grouping records into generations. +When GC is performed, records belonging to older generations can be removed, while records belonging to newer generations have to be retained. + +The fact that a record belongs to a generation is not a transient information: it has to persist across multiple restarts of the system. +This means that the generation of a record has to be persisted together with the record. + +To not incur in the size penalty of persisting additional information for each and every record, the generation is persisted only once in the segment header. +Thus, the generation of a record is defined as the generation of the segment containing that record. + +The original data format for the segment header contained some holes in the specification. +The change made good use of one of those holes (bytes 10-13) to save the generation as a 4-byte integer value. + +## Stable identifiers + +* Jira issue: [OAK-3348](https://issues.apache.org/jira/browse/OAK-3348) +* Since: Oak Segment Tar 0.0.2 + +The fastest way to compare two node records is to compare their addresses. +If their addresses are equal, the two node records are guaranteed to be equal. +Transitively, the subtrees identified by those node records are guaranteed to be equal. + +The situation gets more complicated when the generation-based GC algorithm copies a node record over a new generation to save it from being deleted. +In this situation, two copies of the same node record live in two different generation, in two different segments and at two different addresses. +If you want to figure out if those two node records are the same, the trick of comparing their addresses will not work anymore. + +To overcome this problem, a stable identifier has been added to every node record. +When a new node record is serialized, the address it is serialized to becomes its stable identifier. +The stable identifier is included in the node record and becomes part of its serialized format. + +When the node record is copied to a new generation and a new segment, its address will inevitably change. +The stable identifier instead, being part of the node record itself, will not change. +This enables fast comparison between different copies of the same node records. +Instead of comparing their addresses, you can compare their stable identifiers to achieve the same result. + +The stable identifier is serialized as a 18-bytes-long string record. +This record, in turn, is referenced from the node record by adding an additional 3-bytes-long reference field to it. +In conclusion, stable identifiers add an overhead of 21 bytes to every node record. + +## Binary references index + +* Jira issue: [OAK-4201](https://issues.apache.org/jira/browse/OAK-4201) +* Since: Oak Segment Tar 0.0.4 + +The original data format in Oak Segment mandates that every segment maintains a list of references to external binaries. +Every time a record references an external binary - i.e. a piece of binary data that is stored in a Blob Store - a new binary reference is added to its segment. +The list of references to external binaries is inspected periodically by the Blob Store GC algorithm to know which binaries are currently in use. +The Blob Store GC algorithm removes every binary that is not reported as used by the Segment Store. + +Retrieving the comprehensive list of external binaries for the whole repository is an expensive operation when it comes to I/O. +Every segment in every TAR file has to be read in memory and the list of references to external binaries have to be parsed. +Even if a segment does not contain references to external binaries, it has to be read in memory first for the system to figure it out. + +To make this process faster and less greedy for I/O resources, Oak Segment Tar introduces an index of references to external binaries in every TAR file. +This index aggregates the required information from every segment contained in a TAR file. +When Blob Store GC is performed, instead of reading and parsing every segment, you can read and parse the index files. +This optimization may reduce the amount of I/O operations of an order of magnitude in the best case. \ No newline at end of file Index: oak-doc/src/site/markdown/nodestore/segment/overview.md =================================================================== --- oak-doc/src/site/markdown/nodestore/segment/overview.md (revision 1779425) +++ oak-doc/src/site/markdown/nodestore/segment/overview.md (working copy) @@ -50,6 +50,7 @@ * [Diff](#diff) * [History](#history) * [Design](#design) + * [Format changes](#format-changes) ## Garbage Collection @@ -642,3 +643,9 @@ This website also contains an overview of the legacy implementation of the Segment Store and of the design decisions that brought to this implementation. The page is old and describes a deprecated implementation, but can still be accessed [here](../segmentmk.html). + +### Format changes + +The Oak Segment Tar module introduces a number of changes in the data format compared to the legacy Oak Segment. +The changes are described in greater detail [here](changes.html). +Pointers to actual Jira issues can also be found on that page.