Issue Details (XML | Word | Printable)

Key: JCR-926
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Unassigned
Reporter: Jukka Zitting
Votes: 2
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Jackrabbit Content Repository

Global data store for binaries

Created: 16/May/07 11:09 AM   Updated: 15/Jan/08 11:26 PM
Return to search
Component/s: jackrabbit-core
Affects Version/s: None
Fix Version/s: 1.4

Time Tracking:
Not Specified

File Attachments:
  Size
Text File dataStore.patch 2007-06-28 10:44 AM Thomas Mueller 183 kB
Text File Licensed for inclusion in ASF works DataStore.patch 2007-05-16 11:13 AM Jukka Zitting 31 kB
Text File Licensed for inclusion in ASF works DataStore2.patch 2007-06-06 10:29 AM Jukka Zitting 216 kB
Text File Licensed for inclusion in ASF works dataStore3.patch 2007-07-02 03:01 PM Thomas Mueller 208 kB
Zip Archive Licensed for inclusion in ASF works dataStore4.zip 2007-07-06 09:44 AM Thomas Mueller 33 kB
Text File Licensed for inclusion in ASF works dataStore5-garbageCollector.patch 2007-08-06 12:42 PM Thomas Mueller 17 kB
Text File Licensed for inclusion in ASF works internalValue.patch 2007-06-22 09:14 AM Thomas Mueller 44 kB
Text File Licensed for inclusion in ASF works ReadWhileSaveTest.patch 2007-06-20 09:11 PM Jukka Zitting 3 kB

Resolution Date: 13/Sep/07 02:32 PM


 Description  « Hide
There are three main problems with the way Jackrabbit currently handles large binary values:

1) Persisting a large binary value blocks access to the persistence layer for extended amounts of time (see JCR-314)
2) At least two copies of binary streams are made when saving them through the JCR API: one in the transient space, and one when persisting the value
3) Versioining and copy operations on nodes or subtrees that contain large binary values can quickly end up consuming excessive amounts of storage space.

To solve these issues (and to get other nice benefits), I propose that we implement a global "data store" concept in the repository. A data store is an append-only set of binary values that uses short identifiers to identify and access the stored binary values. The data store would trivially fit the requirements of transient space and transaction handling due to the append-only nature. An explicit mark-and-sweep garbage collection process could be added to avoid concerns about storing garbage values.

See the recent NGP value record discussion, especially [1], for more background on this idea.

[1] http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/%3c510143ac0705120919k37d48dc1jc7474b23c9f02cbd@mail.gmail.com%3e


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
No work has yet been logged on this issue.