Description
As notes in OAK-1702 currently BlobStore and FileDataStore do not perform well when large number of small blobs are accessed frequently.
- FileDataStore - It creates a new instance of LazyInputStream [1] which has finalize method implemented (by extending AutoCloseInputStream). This causes slow GC [1] when large number of such streams are created. Further reading lots of such small blob frequently causes lots of os calls for IO which are slow
- BlobStore - When binary content is stored remotely then accessing it frequently would be costly if it is not cached locally
To better support such access patterns we should have a caching BlobStore for reads. At minimum blobs can be cached on heap. However a better approach would be to save such blob content in a bigger file and memory map it. Possibly using the Segment TarFile. In this mode the blobs would be saved off heap and would not put pressure on GC
[1] https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-data/src/main/java/org/apache/jackrabbit/core/data/LazyFileInputStream.java
[2] http://stackoverflow.com/questions/2954948/performance-implications-of-finalizers-on-jvm