Derby
  1. Derby
  2. DERBY-2798

A new approach for main-memory database

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 10.2.2.0
    • Fix Version/s: None
    • Component/s: Store
    • Labels:
      None
    • Environment:
      all

      Description

      As a part of my Masters degree I have created an extension that allows data to reside in memory without the need to serialize it to Page-objects. This is a pretty big chunk of code and is sort of a proof of concept of another way to make an in-memory storage mode.

      I created two new conglomerates, called MemHeap and MemSkiplist. Derby interfaces them the same way as it does with Heap and BTree. These new conglomerates use RawStore for transaction support and logging, but not for storage. Instead it uses a new system service I've called MemStore. This data store only stores pointers to Slot-objects organized in arrays corresponding to its container/conglomerate/table. A Slot-object consists mainly of a DataValueDescriptor[]-object representing a row in a table.

      So, instead of just doing dummy-IO in memory where Derby still thinks its doing real IO and page caching, this new approach bypasses the cache and page-structure by keeping the DataValueDescriptor[]-objects in memory without serializing them.

      Manipulating operations on data in memory are done via new operation-objects (ex. MemInsertOperation, MemInsertUndoOperation, MemDeleteOperation...) with still uses RawStore for transaction control and persistence. Checkpointing is done by serializing the objects in MemStore fuzzily and completely unsynchronized to disk. Recovery consists of de-serializing the objects to MemStore before the existing REDO- and UNDO-phase of Derby recovery in RawStore will get the data transaction-consistent by replaying or undo the new operation-objects in the log.

      Locking is hard coded as row based with locking degree SERIALIZABLE.

      To get Derby to use the new conglomerates I hacked the SQL-layer to create MemHeap-tables and MemSkiplist-indexes when the table name starts with 'mem_'.

      Because this is a major rewrite of the access- and storage-layer there is a lot of known and unknown bugs and missing functionality. What is working is essentially select, insert, update and delete on tables with one primary key.

      1. Derby-10.2.2.0-memstore.diff
        368 kB
        Knut Magne Solem
      2. DERBY-2798.diff
        368 kB
        Knut Magne Solem
      3. DERBY-2798-10.3.1.0.diff
        340 kB
        Knut Magne Solem
      4. DERBY-2798-10.3.1.0.stat
        5 kB
        Knut Magne Solem
      5. select.png
        3 kB
        Knut Magne Solem
      6. update.png
        3 kB
        Knut Magne Solem

        Issue Links

          Activity

          Hide
          Rick Hillegas added a comment -

          Thanks, Knut. This is a very exciting approach. I noticed that you haven't granted to the ASF a license on your patch. Could you re-attach your patch and check the box which grants a license to ASF?

          Show
          Rick Hillegas added a comment - Thanks, Knut. This is a very exciting approach. I noticed that you haven't granted to the ASF a license on your patch. Could you re-attach your patch and check the box which grants a license to ASF?
          Hide
          Knut Magne Solem added a comment -

          Same patch, but granting for ASF inclusion

          Show
          Knut Magne Solem added a comment - Same patch, but granting for ASF inclusion
          Hide
          Myrna van Lunteren added a comment -

          The practice is: enhancements don't get a fix-in until changes go in...
          If this is inteded for the 10.2 branch, then fix-in will most likely be 10.2.2.1...

          Show
          Myrna van Lunteren added a comment - The practice is: enhancements don't get a fix-in until changes go in... If this is inteded for the 10.2 branch, then fix-in will most likely be 10.2.2.1...
          Hide
          Knut Magne Solem added a comment -

          Cleaned up the code and made it work with Derby 10.3.1.0 (unofficial). I also did a simple benchmarks for select and update (with durability=test) on Wisconsin testdata with Java 6.0. Hardware is P4 2.8 GHz with HT running linux.

          The patch is for the unofficial 10.3.1.0 release.

          Show
          Knut Magne Solem added a comment - Cleaned up the code and made it work with Derby 10.3.1.0 (unofficial). I also did a simple benchmarks for select and update (with durability=test) on Wisconsin testdata with Java 6.0. Hardware is P4 2.8 GHz with HT running linux. The patch is for the unofficial 10.3.1.0 release.
          Hide
          Bastian Wassermann added a comment -

          I have patched this version of Derby and i cant see any difference to the unpatched version.
          I thought that this Version would run derby out of memory, so there are no writtings and readings to disk, but when i try this version there is still access to harddisk (when ever i put something into a table)

          I dont know much about how databases work, but with every insert command the derby db writtes to the log1.dat file in the database folder. Is this logging an feature, which can be de-activated or is this access a necessary function. So can this access been switched off, so nothing would be written to hard-disk or is this impossible.

          I thought of a database, that works 100% in virtual memory and writtes the datas in interval-times to harddisk. Is this possible. If you know some manuals which would help me in this matter, i would be very thankful.

          Show
          Bastian Wassermann added a comment - I have patched this version of Derby and i cant see any difference to the unpatched version. I thought that this Version would run derby out of memory, so there are no writtings and readings to disk, but when i try this version there is still access to harddisk (when ever i put something into a table) I dont know much about how databases work, but with every insert command the derby db writtes to the log1.dat file in the database folder. Is this logging an feature, which can be de-activated or is this access a necessary function. So can this access been switched off, so nothing would be written to hard-disk or is this impossible. I thought of a database, that works 100% in virtual memory and writtes the datas in interval-times to harddisk. Is this possible. If you know some manuals which would help me in this matter, i would be very thankful.
          Hide
          Knut Magne Solem added a comment -

          To use MemStore you must create tables with prefix "mem_", ie mem_mytable. This tells the SQL-layer to create MemHeap- and MemSkiplist-conglomerates (tables/indexes). Eventually this should be done via CREATE TABLE options. I have also only tested with the primary key as the first column.

          You can deactivate logging by setting derby.system.durability=test in derby.properties.

          Thanks for pointing out the missing config in modules.properties when building jars. (cloudscape.config.memstore=all on line 300)

          It is correct that MemStore uses more memory, about 50-70% more. Also keep in mind that this is experimental code with limited functionality.

          Show
          Knut Magne Solem added a comment - To use MemStore you must create tables with prefix "mem_", ie mem_mytable. This tells the SQL-layer to create MemHeap- and MemSkiplist-conglomerates (tables/indexes). Eventually this should be done via CREATE TABLE options. I have also only tested with the primary key as the first column. You can deactivate logging by setting derby.system.durability=test in derby.properties. Thanks for pointing out the missing config in modules.properties when building jars. (cloudscape.config.memstore=all on line 300) It is correct that MemStore uses more memory, about 50-70% more. Also keep in mind that this is experimental code with limited functionality.
          Hide
          Dyre Tjeldvoll added a comment -

          This issue has shown up in the 'patch available'-filter for some time now. It does not seem like anyone is willing to commit the patch in its present form, and nobody seem to be actively working on a new version, so I am removing the
          'patch available' flag.

          Show
          Dyre Tjeldvoll added a comment - This issue has shown up in the 'patch available'-filter for some time now. It does not seem like anyone is willing to commit the patch in its present form, and nobody seem to be actively working on a new version, so I am removing the 'patch available' flag.
          Hide
          Kathey Marsden added a comment -

          Can this be duped to DERBY-646 or is this something different?

          Show
          Kathey Marsden added a comment - Can this be duped to DERBY-646 or is this something different?
          Hide
          Kristian Waagan added a comment -

          This is something else.
          Implementing what is suggested here would most likely require a lot more effort than implementing DERBY-646. For instance, it includes a new (to Derby) access method which is better suited for in-memory data than the BTree is.

          I believe implementing this would open up for significantly better performance than the current in-memory back end.
          Two possible next steps:
          a) Describe the current state of the patch; what works, what doesn't?
          b) Investigate how much of Derby must be rewritten for a proper implementation.

          Show
          Kristian Waagan added a comment - This is something else. Implementing what is suggested here would most likely require a lot more effort than implementing DERBY-646 . For instance, it includes a new (to Derby) access method which is better suited for in-memory data than the BTree is. I believe implementing this would open up for significantly better performance than the current in-memory back end. Two possible next steps: a) Describe the current state of the patch; what works, what doesn't? b) Investigate how much of Derby must be rewritten for a proper implementation.
          Hide
          Rick Hillegas added a comment -

          Closing this issue because no work has happened for a long time and because 10.6.1 productized an alternative implementation of in-memory database. The issue can be re-opened if someone wants to pursue it.

          Thanks to Knut Magne Solem for the prototype. It has been very useful for understanding how to implement alternative Stores incrementally.

          Show
          Rick Hillegas added a comment - Closing this issue because no work has happened for a long time and because 10.6.1 productized an alternative implementation of in-memory database. The issue can be re-opened if someone wants to pursue it. Thanks to Knut Magne Solem for the prototype. It has been very useful for understanding how to implement alternative Stores incrementally.

            People

            • Assignee:
              Unassigned
              Reporter:
              Knut Magne Solem
            • Votes:
              4 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development