An Azure Blob Storage implementation of the segment storage, based on the
Thew new implementation doesn't use tar files. They are replaced with directories, storing segments, named after their UUIDs. This approach has following advantages:
- no need to call seek(), which may be expensive on a remote file system. Rather than that we can read the whole file (=segment) at once.
- it's possible to send multiple segments at once, asynchronously, which reduces the performance overhead (see below).
The file structure is as follows:
For the segment files, each name is prefixed with the index number. This allows to maintain an order, as in the tar archive. This order is normally stored in the index files as well, but if it's missing, the recovery process uses the prefixes to maintain it.
Each file contains the raw segment data, with no padding/headers. Apart from the segment files, there are 3 special files: binary references (.brf), segment graph (.gph) and segment index (.idx).
Normally, all the TarWriter writes are synchronous, appending the segments to the tar file. In case of Azure Blob Storage each write involves a network latency. That's why the SegmentWriteQueue was introduced. The segments are added to the blocking dequeue, which is served by a number of the consumer threads, writing the segments to the cloud. There's also a map UUID->Segment, which allows to return the segments in case they are requested by the readSegment() method before they are actually persisted. Segments are removed from the map only after a successful write operation.
The flush() method blocks accepting the new segments and returns after all waiting segments are written. The close() method waits until the current operations are finished and stops all threads.
The asynchronous mode can be disabled by setting the number of threads to 0.
If the Azure Blob Storage write() operation fails, the segment will be re-added and the queue is switched to an "recovery mode". In this mode, all the threads are suspended and new segments are not accepted (active waiting). There's a single thread which retries adding the segment with some delay. If the segment is successfully written, the queue will back to the normal operation.
This way the unavailable remote service is not flooded by the requests and we're not accepting the segments when we can't persist them.
The close() method finishes the recovery mode - in this case, some of the awaiting segments won't be persisted.
The asynchronous mode isn't as reliable as the standard, synchronous case. Following cases are possible:
- TarWriter#writeEntry() returns successfully, but the segments are not persisted.
- TarWriter#writeEntry() accepts a number of segments: S1, S2, S3. The S2 and S3 are persisted, but the S1 is not.
On the other hand:
- If the TarWriter#flush() returns successfully, it means that all the accepted segments has been persisted.
During the segment recovery (eg. if the index file is missing), the Azure implementation checks if there's no missing segment in the middle. If so, only the consecutive segments are recovered. For instance, if we have S1, S2, S3, S5, S6, S7, then the recovery process will return only the first three.