Issue Details (XML | Word | Printable)

Key: DERBY-2798
Type: Improvement Improvement
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: Knut Magne Solem
Votes: 4
Watchers: 7
Operations

If you were logged in you would be able to see more operations.
Derby

A new approach for main-memory database

Created: 09/Jun/07 01:24 PM   Updated: 01/Jul/09 12:34 AM
Return to search
Component/s: Store
Affects Version/s: 10.2.2.0
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
File Derby-10.2.2.0-memstore.diff 2007-06-09 01:25 PM Knut Magne Solem 368 kB
File Licensed for inclusion in ASF works DERBY-2798-10.3.1.0.diff 2007-06-24 07:49 PM Knut Magne Solem 340 kB
File Licensed for inclusion in ASF works DERBY-2798-10.3.1.0.stat 2007-06-24 07:49 PM Knut Magne Solem 5 kB
File Licensed for inclusion in ASF works DERBY-2798.diff 2007-06-11 01:26 PM Knut Magne Solem 368 kB
Image Attachments:

1. select.png
(3 kB)

2. update.png
(3 kB)
Environment: all
Issue Links:
Reference
 


 Description  « Hide
As a part of my Masters degree I have created an extension that allows data to reside in memory without the need to serialize it to Page-objects. This is a pretty big chunk of code and is sort of a proof of concept of another way to make an in-memory storage mode.

I created two new conglomerates, called MemHeap and MemSkiplist. Derby interfaces them the same way as it does with Heap and BTree. These new conglomerates use RawStore for transaction support and logging, but not for storage. Instead it uses a new system service I've called MemStore. This data store only stores pointers to Slot-objects organized in arrays corresponding to its container/conglomerate/table. A Slot-object consists mainly of a DataValueDescriptor[]-object representing a row in a table.
 
So, instead of just doing dummy-IO in memory where Derby still thinks its doing real IO and page caching, this new approach bypasses the cache and page-structure by keeping the DataValueDescriptor[]-objects in memory without serializing them.
 
Manipulating operations on data in memory are done via new operation-objects (ex. MemInsertOperation, MemInsertUndoOperation, MemDeleteOperation...) with still uses RawStore for transaction control and persistence. Checkpointing is done by serializing the objects in MemStore fuzzily and completely unsynchronized to disk. Recovery consists of de-serializing the objects to MemStore before the existing REDO- and UNDO-phase of Derby recovery in RawStore will get the data transaction-consistent by replaying or undo the new operation-objects in the log.
 
Locking is hard coded as row based with locking degree SERIALIZABLE.
 
To get Derby to use the new conglomerates I hacked the SQL-layer to create MemHeap-tables and MemSkiplist-indexes when the table name starts with 'mem_'.
 
Because this is a major rewrite of the access- and storage-layer there is a lot of known and unknown bugs and missing functionality. What is working is essentially select, insert, update and delete on tables with one primary key.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Rick Hillegas added a comment - 11/Jun/07 01:04 PM
Thanks, Knut. This is a very exciting approach. I noticed that you haven't granted to the ASF a license on your patch. Could you re-attach your patch and check the box which grants a license to ASF?

Knut Magne Solem added a comment - 11/Jun/07 01:26 PM
Same patch, but granting for ASF inclusion

Myrna van Lunteren added a comment - 12/Jun/07 10:28 PM
The practice is: enhancements don't get a fix-in until changes go in...
If this is inteded for the 10.2 branch, then fix-in will most likely be 10.2.2.1...

Knut Magne Solem added a comment - 24/Jun/07 07:49 PM
Cleaned up the code and made it work with Derby 10.3.1.0 (unofficial). I also did a simple benchmarks for select and update (with durability=test) on Wisconsin testdata with Java 6.0. Hardware is P4 2.8 GHz with HT running linux.

The patch is for the unofficial 10.3.1.0 release.

Bastian Wassermann added a comment - 03/Jul/07 12:07 PM
I have patched this version of Derby and i cant see any difference to the unpatched version.
I thought that this Version would run derby out of memory, so there are no writtings and readings to disk, but when i try this version there is still access to harddisk (when ever i put something into a table)

I dont know much about how databases work, but with every insert command the derby db writtes to the log1.dat file in the database folder. Is this logging an feature, which can be de-activated or is this access a necessary function. So can this access been switched off, so nothing would be written to hard-disk or is this impossible.

I thought of a database, that works 100% in virtual memory and writtes the datas in interval-times to harddisk. Is this possible. If you know some manuals which would help me in this matter, i would be very thankful.

Knut Magne Solem added a comment - 03/Jul/07 04:18 PM
To use MemStore you must create tables with prefix "mem_", ie mem_mytable. This tells the SQL-layer to create MemHeap- and MemSkiplist-conglomerates (tables/indexes). Eventually this should be done via CREATE TABLE options. I have also only tested with the primary key as the first column.
 
You can deactivate logging by setting derby.system.durability=test in derby.properties.

Thanks for pointing out the missing config in modules.properties when building jars. (cloudscape.config.memstore=all on line 300)

It is correct that MemStore uses more memory, about 50-70% more. Also keep in mind that this is experimental code with limited functionality.


Dyre Tjeldvoll added a comment - 30/Nov/07 05:41 PM
This issue has shown up in the 'patch available'-filter for some time now. It does not seem like anyone is willing to commit the patch in its present form, and nobody seem to be actively working on a new version, so I am removing the
'patch available' flag.

Kathey Marsden added a comment - 07/May/09 04:30 PM
Can this be duped to DERBY-646 or is this something different?

Kristian Waagan added a comment - 07/May/09 05:01 PM
This is something else.
Implementing what is suggested here would most likely require a lot more effort than implementing DERBY-646. For instance, it includes a new (to Derby) access method which is better suited for in-memory data than the BTree is.

I believe implementing this would open up for significantly better performance than the current in-memory back end.
Two possible next steps:
 a) Describe the current state of the patch; what works, what doesn't?
 b) Investigate how much of Derby must be rewritten for a proper implementation.