[OAK-4200] [BlobGC] Improve collection times of blobs available - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Technical task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.5.6, 1.6.0
Component/s: None
Labels:
None

Description

The blob collection phase (Identifying all the blobs available in the data store) is quite an expensive part of the whole GC process, taking up a few hours sometimes on large repositories, due to iteration of the sub-folders in the data store.

In an offline discussion with tmueller and chetanm, the idea came up that this phase can be faster if

Blobs ids are tracked when the blobs are added for e.g. in a simple file in the datastore per cluster node.
GC then consolidates this file from all the cluster nodes and uses it to get the candidates for GC.
This variant of the MarkSweepGC can be triggered more frequently. It would be ok to miss blob id additions to this file during a crash etc., as these blobs can be cleaned up in the regular MarkSweepGC cycles triggered occasionally.

We also may be able to track other metadata along with the blob ids like paths, timestamps etc. for auditing/analytics, in-conjunction with OAK-3140.

Attachments

Issue Links

Blocked

OAK-4492 FileDataStore with configurable path based structure

Open

is related to

OAK-4961 Default repository.home in DocumentNodeStoreService hides framework property

Closed

OAK-5461 [BlobGC] BlobIdTracker remove() should merge generations

Closed

OAK-5231 Proper resource cleanup in BlobTrackerTest

Closed

OAK-3140 DataStore / BlobStore: add a method to pass a "type" when writing

Open

OAK-5014 Minor description change for OSGi blobTrackSnapshotIntervalInSecs property

Closed

relates to

OAK-4429 [oak-blob-cloud] S3Backend#getAllIdentifiers should not store all elements in memory

Closed

OAK-4430 DataStoreBlobStore#getAllChunkIds fetches DataRecord when not needed

Closed

OAK-2808 Active deletion of 'deleted' Lucene index files from DataStore without relying on full scale Blob GC

Closed

(1 is related to, 3 relates to)

Activity

People

Assignee:: Amit Jain

Reporter:: Amit Jain

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 13/Apr/16 14:29

Updated:: 08/Oct/19 15:21

Resolved:: 20/Jul/16 05:53