Derby
  1. Derby
  2. DERBY-2911

Implement a buffer manager using java.util.concurrent classes

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.4.1.3
    • Fix Version/s: 10.4.1.3
    • Component/s: Services
    • Labels:
      None
    • Bug behavior facts:
      Performance

      Description

      There are indications that the buffer manager is a bottleneck for some types of multi-user load. For instance, Anders Morken wrote this in a comment on DERBY-1704: "With a separate table and index for each thread (to remove latch contention and lock waits from the equation) we (...) found that org.apache.derby.impl.services.cache.Clock.find()/release() caused about 5 times more contention than the synchronization in LockSet.lockObject() and LockSet.unlock(). That might be an indicator of where to apply the next push".

      It would be interesting to see the scalability and performance of a buffer manager which exploits the concurrency utilities added in Java SE 5.

      1. cleaner.diff
        3 kB
        Knut Anders Hatlen
      2. cleaner.tar
        16 kB
        Knut Anders Hatlen
      3. d2911-1.diff
        25 kB
        Knut Anders Hatlen
      4. d2911-1.stat
        0.3 kB
        Knut Anders Hatlen
      5. d2911-10.diff
        16 kB
        Knut Anders Hatlen
      6. d2911-10.stat
        0.2 kB
        Knut Anders Hatlen
      7. d2911-11.diff
        4 kB
        Knut Anders Hatlen
      8. d2911-12.diff
        11 kB
        Knut Anders Hatlen
      9. d2911-13.diff
        6 kB
        Knut Anders Hatlen
      10. d2911-14.diff
        2 kB
        Knut Anders Hatlen
      11. d2911-15.diff
        9 kB
        Knut Anders Hatlen
      12. d2911-2.diff
        2 kB
        Knut Anders Hatlen
      13. d2911-3.diff
        2 kB
        Knut Anders Hatlen
      14. d2911-4.diff
        5 kB
        Knut Anders Hatlen
      15. d2911-5.diff
        5 kB
        Knut Anders Hatlen
      16. d2911-6.diff
        24 kB
        Knut Anders Hatlen
      17. d2911-6.stat
        0.4 kB
        Knut Anders Hatlen
      18. d2911-7.diff
        6 kB
        Knut Anders Hatlen
      19. d2911-7a.diff
        7 kB
        Knut Anders Hatlen
      20. d2911-9.diff
        20 kB
        Knut Anders Hatlen
      21. d2911-9.stat
        0.3 kB
        Knut Anders Hatlen
      22. d2911-enable.diff
        1.0 kB
        Knut Anders Hatlen
      23. d2911-entry-javadoc.diff
        3 kB
        Knut Anders Hatlen
      24. d2911perf.java
        3 kB
        Knut Anders Hatlen
      25. d2911-unused.diff
        10 kB
        Knut Anders Hatlen
      26. d2911-unused.stat
        0.7 kB
        Knut Anders Hatlen
      27. derby-2911-8.diff
        18 kB
        Knut Anders Hatlen
      28. derby-2911-8.stat
        0.4 kB
        Knut Anders Hatlen
      29. perftest.diff
        47 kB
        Knut Anders Hatlen
      30. perftest.pdf
        74 kB
        Knut Anders Hatlen
      31. perftest.stat
        1.0 kB
        Knut Anders Hatlen
      32. perftest2.diff
        47 kB
        Knut Anders Hatlen
      33. perftest6.pdf
        89 kB
        Knut Anders Hatlen
      34. poisson_patch8.tar
        20 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          I am interested in working on this issue. What I would like to
          achieve, is to make it possible for different threads to access the
          buffer manager without blocking each other, as long as they request
          different objects and the requested objects are present in the
          cache. (That is, I think it is OK that accesses to the same object or
          accesses that need to fetch an object into the cache might have to
          wait for each other, as they do in the current buffer manager.)

          There are two ways to achieve this:

          1) Rewrite the old buffer manager (Clock) so that it allows more
          concurrent access (possibly splitting the HashMap and changing
          the synchronization model for Clock/CachedItem)

          2) Write a new buffer manager which uses the concurrency utilities
          in newer Java versions (ConcurrentHashMap, ReentrantLock and
          friends)

          I like option 2 best myself, mostly because it allows us to reuse the
          wheel (concurrency utils) rather than reinventing it. The downside is
          that the old implementation must be kept in the code as long as we
          support JVMs without the concurrency utilities (JDK 1.4 and
          Foundation). Because of the clearly defined interface (CacheManager)
          for the buffer manager, adding an alternative implementation should be
          transparent to the rest of the Derby code, though.

          If we decide to go for option 2, I will try to implement it
          incrementally with these steps

          1) Implement a buffer manager with no replacement policy (that is,
          it ignores the maximum size and never throws data out). After
          this step, the buffer manager should allow concurrent access for
          all threads that request different objects.

          2) Implement the replacement policy. After this step, the buffer
          manager should be able to throw out objects that have not been
          used for some time, and thereby avoid growing bigger than the
          maximum size.

          3) Enable the new buffer manager by default for JDK 1.5 and higher.

          In step 2, I think I will stick to the clock algorithm that we
          currently use. Last year, a Google Summer of Code student investigated
          different replacement algorithms for Derby. Although changing the
          replacement algorithm is out of the scope of this issue, he suggested
          some changes to make it easier to switch replacement algorithm. I will
          see if I can get some ideas from his work.

          Comments to this plan would be appreciated.

          Show
          Knut Anders Hatlen added a comment - I am interested in working on this issue. What I would like to achieve, is to make it possible for different threads to access the buffer manager without blocking each other, as long as they request different objects and the requested objects are present in the cache. (That is, I think it is OK that accesses to the same object or accesses that need to fetch an object into the cache might have to wait for each other, as they do in the current buffer manager.) There are two ways to achieve this: 1) Rewrite the old buffer manager (Clock) so that it allows more concurrent access (possibly splitting the HashMap and changing the synchronization model for Clock/CachedItem) 2) Write a new buffer manager which uses the concurrency utilities in newer Java versions (ConcurrentHashMap, ReentrantLock and friends) I like option 2 best myself, mostly because it allows us to reuse the wheel (concurrency utils) rather than reinventing it. The downside is that the old implementation must be kept in the code as long as we support JVMs without the concurrency utilities (JDK 1.4 and Foundation). Because of the clearly defined interface (CacheManager) for the buffer manager, adding an alternative implementation should be transparent to the rest of the Derby code, though. If we decide to go for option 2, I will try to implement it incrementally with these steps 1) Implement a buffer manager with no replacement policy (that is, it ignores the maximum size and never throws data out). After this step, the buffer manager should allow concurrent access for all threads that request different objects. 2) Implement the replacement policy. After this step, the buffer manager should be able to throw out objects that have not been used for some time, and thereby avoid growing bigger than the maximum size. 3) Enable the new buffer manager by default for JDK 1.5 and higher. In step 2, I think I will stick to the clock algorithm that we currently use. Last year, a Google Summer of Code student investigated different replacement algorithms for Derby. Although changing the replacement algorithm is out of the scope of this issue, he suggested some changes to make it easier to switch replacement algorithm. I will see if I can get some ideas from his work. Comments to this plan would be appreciated.
          Hide
          Daniel John Debrunner added a comment -

          I'd say option 2 is best.

          Show
          Daniel John Debrunner added a comment - I'd say option 2 is best.
          Hide
          Bryan Pendleton added a comment -

          Would a particular instantiation of the engine ever have more than
          one buffer manager at a time? How (and why) would a user choose
          one buffer manager implementation versus the other? Would it
          be as simple as:

          • if this is JDK 1.5+, we always unconditionally use the new one
          • if this is JDK 1.4, we always unconditionally use the old one
            Or is there some other reason that a user would want to override this?
          Show
          Bryan Pendleton added a comment - Would a particular instantiation of the engine ever have more than one buffer manager at a time? How (and why) would a user choose one buffer manager implementation versus the other? Would it be as simple as: if this is JDK 1.5+, we always unconditionally use the new one if this is JDK 1.4, we always unconditionally use the old one Or is there some other reason that a user would want to override this?
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for your comments, Dan and Bryan.

          I was thinking about using Derby's module loader, which would basically do what Bryan says. With JDK 1.4 or J2ME, the old one is used, and with JDK 1.5+ the new one. It is possible to override the default by specifying -Dderby.module.cacheManager=org.apache.derby... on the command line, but only when using sane builds, I think. Except for the obvious case where there is a bug in the new implementation, I don't think users would want to override it. The one thing I can think of, is if the old implementation performs better for a certain load/configuration.

          Show
          Knut Anders Hatlen added a comment - Thanks for your comments, Dan and Bryan. I was thinking about using Derby's module loader, which would basically do what Bryan says. With JDK 1.4 or J2ME, the old one is used, and with JDK 1.5+ the new one. It is possible to override the default by specifying -Dderby.module.cacheManager=org.apache.derby... on the command line, but only when using sane builds, I think. Except for the obvious case where there is a bug in the new implementation, I don't think users would want to override it. The one thing I can think of, is if the old implementation performs better for a certain load/configuration.
          Hide
          Knut Anders Hatlen added a comment -

          I noticed that some of the methods in CacheFactory/CacheManager are not used. As a first step before starting on a new cache manager, I would like to remove those methods from the interfaces (but I will leave the actual implementations in Clock untouched for now). This way we can avoid having two implementations of something that's never used. We can always implement some of the methods later, if we ever need them, but I think it is better to wait until the need arises, rather than to guess now what we might need in the future.

          The methods that are not used, are:

          CacheFactory.newSizedCacheManager()
          CacheManager.getMaximumSize()
          CacheManager.resize()
          CacheManager.containsKey()
          CacheManager.setUsed()
          CacheManager.getNumberInUse()
          CacheManager.getCacheStats()
          CacheManager.resetCacheStats()
          CacheManager.scan()

          getCacheStats() and resetCacheStats() are called by other methods (simple forwarding methods), but those methods are never called. The attached patch (d2911-unused) removes all the unused methods, including the methods that call getCacheStats() and resetCacheStats(). suites.All passed. I have also started a derbyall run which has not completed yet.

          Show
          Knut Anders Hatlen added a comment - I noticed that some of the methods in CacheFactory/CacheManager are not used. As a first step before starting on a new cache manager, I would like to remove those methods from the interfaces (but I will leave the actual implementations in Clock untouched for now). This way we can avoid having two implementations of something that's never used. We can always implement some of the methods later, if we ever need them, but I think it is better to wait until the need arises, rather than to guess now what we might need in the future. The methods that are not used, are: CacheFactory.newSizedCacheManager() CacheManager.getMaximumSize() CacheManager.resize() CacheManager.containsKey() CacheManager.setUsed() CacheManager.getNumberInUse() CacheManager.getCacheStats() CacheManager.resetCacheStats() CacheManager.scan() getCacheStats() and resetCacheStats() are called by other methods (simple forwarding methods), but those methods are never called. The attached patch (d2911-unused) removes all the unused methods, including the methods that call getCacheStats() and resetCacheStats(). suites.All passed. I have also started a derbyall run which has not completed yet.
          Hide
          Knut Anders Hatlen added a comment -

          Committed d2911-unused.diff with revision 571055.

          Show
          Knut Anders Hatlen added a comment - Committed d2911-unused.diff with revision 571055.
          Hide
          Knut Anders Hatlen added a comment -

          I have written a small test which can be used to measure performance improvements in this issue. It is a multi-threaded test which performs primary-key lookups. Each thread has its own table to work on to avoid contention on latches (which is a scalability issue separate from the one we're trying to solve in this JIRA issue).

          It's a standalone Java program for now since our JUnit framework doesn't support multi-threaded tests yet, as far as I can see. To run the test, use this command line:

          java d2911perf <DB-URL> <THREADS> <WARMUP> <COLLECT>

          The test will have a warm-up phase of <WARMUP> seconds and collect results for <COLLECT> seconds.

          Show
          Knut Anders Hatlen added a comment - I have written a small test which can be used to measure performance improvements in this issue. It is a multi-threaded test which performs primary-key lookups. Each thread has its own table to work on to avoid contention on latches (which is a scalability issue separate from the one we're trying to solve in this JIRA issue). It's a standalone Java program for now since our JUnit framework doesn't support multi-threaded tests yet, as far as I can see. To run the test, use this command line: java d2911perf <DB-URL> <THREADS> <WARMUP> <COLLECT> The test will have a warm-up phase of <WARMUP> seconds and collect results for <COLLECT> seconds.
          Hide
          Knut Anders Hatlen added a comment -

          Attached is a partial implementation of a new buffer manager. It adds a class called ConcurrentCache, which implements the CacheManager interface and keeps the cached objects in a ConcurrentHashMap. It also adds ConcurrentCacheFactory, which creates instances of the new cache, and CacheEntry, which represents an entry in the cache and uses a ReentrantLock to protect its internal state from concurrent accesses.

          Currently, only basic operations like find, release, release, remove and clean have been implemented. There is no replacement algorithm, which basically means that the cache doesn't have a defined maximum size. I managed to run the attached performance test with the patch (manually edited modules.properties to load the new buffer manager). It showed promising results. With 100 threads on a machine with eight CPUs, the throughput was almost doubled. With fewer threads (or fewer CPUs) there was of course less improvement. I didn't notice any performance loss for the single-threaded case, though.

          Show
          Knut Anders Hatlen added a comment - Attached is a partial implementation of a new buffer manager. It adds a class called ConcurrentCache, which implements the CacheManager interface and keeps the cached objects in a ConcurrentHashMap. It also adds ConcurrentCacheFactory, which creates instances of the new cache, and CacheEntry, which represents an entry in the cache and uses a ReentrantLock to protect its internal state from concurrent accesses. Currently, only basic operations like find, release, release, remove and clean have been implemented. There is no replacement algorithm, which basically means that the cache doesn't have a defined maximum size. I managed to run the attached performance test with the patch (manually edited modules.properties to load the new buffer manager). It showed promising results. With 100 threads on a machine with eight CPUs, the throughput was almost doubled. With fewer threads (or fewer CPUs) there was of course less improvement. I didn't notice any performance loss for the single-threaded case, though.
          Hide
          Bryan Pendleton added a comment -

          Is there something about our JUnit framework that inhibits multi-threaded tests?
          I'm not sure why JUnit would know or care how many threads you used.

          Show
          Bryan Pendleton added a comment - Is there something about our JUnit framework that inhibits multi-threaded tests? I'm not sure why JUnit would know or care how many threads you used.
          Hide
          Kristian Waagan added a comment -

          I think the main issue is that the JUnit framework (for instance the runner) is not aware of threads spawned from the test method.
          If you don't code the waiting for the spawned threads into your test method, JUnit will just complete and continue/exit - possibly terminating your threads if the test is the last test to be run.
          Second, you don't have any reporting, monitoring or controlling capabilities for the threads.

          If you write your test method appropriately (some approaches used are Thread.join() or features from the java.util.concurrent package in Java SE 5.0), I think you would be okay. You would just not have any "support functions" from the framework, and would have to do everything yourself.
          There are also a few libraries for writing multithreaded JUnit tests out there (GroboUtils, the TestDecorator approach, maybe more?).

          Other test frameworks, for instance TestNG, have the possibility to specify how many parallel threads that are running a test method (not to be mixed up with the support to run tests in parallel, taking things like dependencies into account). It is not quite clear to me how success/failure is measured, or how the timeout feature is implemented.

          Show
          Kristian Waagan added a comment - I think the main issue is that the JUnit framework (for instance the runner) is not aware of threads spawned from the test method. If you don't code the waiting for the spawned threads into your test method, JUnit will just complete and continue/exit - possibly terminating your threads if the test is the last test to be run. Second, you don't have any reporting, monitoring or controlling capabilities for the threads. If you write your test method appropriately (some approaches used are Thread.join() or features from the java.util.concurrent package in Java SE 5.0), I think you would be okay. You would just not have any "support functions" from the framework, and would have to do everything yourself. There are also a few libraries for writing multithreaded JUnit tests out there (GroboUtils, the TestDecorator approach, maybe more?). Other test frameworks, for instance TestNG, have the possibility to specify how many parallel threads that are running a test method (not to be mixed up with the support to run tests in parallel, taking things like dependencies into account). It is not quite clear to me how success/failure is measured, or how the timeout feature is implemented.
          Hide
          Knut Anders Hatlen added a comment -

          I was primarily referring to the performance test framework (JDBCPerfTestCase) which makes it very easy to write single-threaded performance tests, but currently doesn't support writing multi-threaded tests. I'm sure there's nothing that prevents us from writing multi-threaded tests provided that we address the challenges summarized by Kristian.

          Show
          Knut Anders Hatlen added a comment - I was primarily referring to the performance test framework (JDBCPerfTestCase) which makes it very easy to write single-threaded performance tests, but currently doesn't support writing multi-threaded tests. I'm sure there's nothing that prevents us from writing multi-threaded tests provided that we address the challenges summarized by Kristian.
          Hide
          Knut Anders Hatlen added a comment -

          Since the code added by the patch is disabled, there should be little risk of breaking something. I therefore committed it with revision 572645. We would need the classes and the stubs regardless of whether there are comments to the implementation, I think. And you should of course feel free to test it and comment on it.

          The simplest way to test it, is to edit the derby.module.cacheManager property in modules.property so that it points to ConcurrentCacheFactory instead of ClockFactory, and recompile. I made that change myself and ran the full JUnit test suite. There was a total of 19 test cases failing (17 errors + 2 failures). Most of them failed because CacheManager.values() is not implemented (used by a VTI). I'll investigate the other failures, but it seems at least some of them are caused by the lack of replacement policy so that objects are left in the cache when the test expects them to have been thrown out.

          Show
          Knut Anders Hatlen added a comment - Since the code added by the patch is disabled, there should be little risk of breaking something. I therefore committed it with revision 572645. We would need the classes and the stubs regardless of whether there are comments to the implementation, I think. And you should of course feel free to test it and comment on it. The simplest way to test it, is to edit the derby.module.cacheManager property in modules.property so that it points to ConcurrentCacheFactory instead of ClockFactory, and recompile. I made that change myself and ran the full JUnit test suite. There was a total of 19 test cases failing (17 errors + 2 failures). Most of them failed because CacheManager.values() is not implemented (used by a VTI). I'll investigate the other failures, but it seems at least some of them are caused by the lack of replacement policy so that objects are left in the cache when the test expects them to have been thrown out.
          Hide
          Knut Anders Hatlen added a comment -

          The attached patch (d2911-2.diff) implements the values() method. With this patch, I only see 13 failures in the JUnit tests. All of the failures are in StatementPlanCacheTest and fail with "expect plan to [be] thrown out". These failures are expected until the replacement policy is in place.

          Committed revision 572693.

          Show
          Knut Anders Hatlen added a comment - The attached patch (d2911-2.diff) implements the values() method. With this patch, I only see 13 failures in the JUnit tests. All of the failures are in StatementPlanCacheTest and fail with "expect plan to [be] thrown out". These failures are expected until the replacement policy is in place. Committed revision 572693.
          Hide
          Manish Khettry added a comment -

          One thought on the ConcurrentCache class. Since this is a Java 5 class, is it possbie to use generics? In this case something like:

          ConcurrentCache<K>

          { private final ConcurrentHashMap<K, CacheEntry> cache; }

          I realize that most of (or all) the time, users of this class will be pre jdk 5 classes and this may not buy us all that much but it is still worth doing I think.

          Show
          Manish Khettry added a comment - One thought on the ConcurrentCache class. Since this is a Java 5 class, is it possbie to use generics? In this case something like: ConcurrentCache<K> { private final ConcurrentHashMap<K, CacheEntry> cache; } I realize that most of (or all) the time, users of this class will be pre jdk 5 classes and this may not buy us all that much but it is still worth doing I think.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks Manish, that's a great suggestion, but since the objects in the cache need to implement the Cacheable interface and Cacheable.getIdentity() returns an Object, we're kind of locked into declaring the hash key as Object. We could work around that problem by casting the identity to the parametrized type K, like

          K key = (K) cacheable.getIdentity();
          cache.remove(key);

          but that'll give us a compile-time warning about an unchecked cast. So until the interfaces that we need to work with are rewritten using generics, I think it's best not to parametrize the key type.

          Show
          Knut Anders Hatlen added a comment - Thanks Manish, that's a great suggestion, but since the objects in the cache need to implement the Cacheable interface and Cacheable.getIdentity() returns an Object, we're kind of locked into declaring the hash key as Object. We could work around that problem by casting the identity to the parametrized type K, like K key = (K) cacheable.getIdentity(); cache.remove(key); but that'll give us a compile-time warning about an unchecked cast. So until the interfaces that we need to work with are rewritten using generics, I think it's best not to parametrize the key type.
          Hide
          Knut Anders Hatlen added a comment -

          Attached is a patch (d2911-3.diff) which implements the shutdown method (more code will probably have to go in there when we have a replacement policy and a background writer). The shutdown method ensures that new calls to find() and findCached() return null (and also create() since that's how it was implemented in Clock, although the CacheManager interface doesn't require that) by setting a volatile boolean flag. It also calls cleanAll() and ageOut() as demanded by CacheManager.shutdown()'s javadoc (Clock actually calls ageOut() both before and after cleanAll(), but I fail to see why that should make any difference).

          Show
          Knut Anders Hatlen added a comment - Attached is a patch (d2911-3.diff) which implements the shutdown method (more code will probably have to go in there when we have a replacement policy and a background writer). The shutdown method ensures that new calls to find() and findCached() return null (and also create() since that's how it was implemented in Clock, although the CacheManager interface doesn't require that) by setting a volatile boolean flag. It also calls cleanAll() and ageOut() as demanded by CacheManager.shutdown()'s javadoc (Clock actually calls ageOut() both before and after cleanAll(), but I fail to see why that should make any difference).
          Hide
          Knut Anders Hatlen added a comment -

          Committed revision 572953.

          Show
          Knut Anders Hatlen added a comment - Committed revision 572953.
          Hide
          Knut Anders Hatlen added a comment -

          The next step is to implement the replacement algorithm. As I said
          earlier, I will use the clock algorithm as the old buffer manager
          does, but separate the implementation of the buffer manager and the
          replacement algorithm to make it easier to switch to or experiment
          with other algorithms.

          I was thinking that we could have an interface for the replacement
          policy that would look similar to this:

          interface ReplacementPolicy {
          Callback insertEntry(CacheEntry e);
          interface Callback

          { void access(); void free(); }

          }

          So when ConcurrentCache needs to insert a new entry, it would call
          ReplacementPolicy.insertEntry() which would find free space (or make
          room) for the new entry. With the clock algorithm, this means moving
          the clock handle until a not recently used entry is found, and
          possibly cleaning dirty entries or increasing the size of the
          clock. insertEntry() would return a callback object, which
          ConcurrentCache could use to notify the replacement policy about
          events. Callback.access() would be used to notify that someone
          accessed the object, so that it for instance could be marked as
          recently used in the clock, and Callback.free() would mark the entry
          as unused/invalid and make it available for reuse.

          Show
          Knut Anders Hatlen added a comment - The next step is to implement the replacement algorithm. As I said earlier, I will use the clock algorithm as the old buffer manager does, but separate the implementation of the buffer manager and the replacement algorithm to make it easier to switch to or experiment with other algorithms. I was thinking that we could have an interface for the replacement policy that would look similar to this: interface ReplacementPolicy { Callback insertEntry(CacheEntry e); interface Callback { void access(); void free(); } } So when ConcurrentCache needs to insert a new entry, it would call ReplacementPolicy.insertEntry() which would find free space (or make room) for the new entry. With the clock algorithm, this means moving the clock handle until a not recently used entry is found, and possibly cleaning dirty entries or increasing the size of the clock. insertEntry() would return a callback object, which ConcurrentCache could use to notify the replacement policy about events. Callback.access() would be used to notify that someone accessed the object, so that it for instance could be marked as recently used in the clock, and Callback.free() would mark the entry as unused/invalid and make it available for reuse.
          Hide
          Knut Anders Hatlen added a comment -

          Some observations on the replacement algorithm in the current buffer
          manager:

          1) If it is known that there is at least one invalid entry in the
          cache, the normal clock rotation algorithm (which scans no more than
          20% of the cache) is bypassed and a scan which may go through the
          entire cache is performed instead. (The rationale is that we will
          avoid growing the cache if we don't have to.) This is probably OK
          since invalid objects should be rare (normally present only after DROP
          TABLE or similar). However, because objects that are about to be
          inserted into the cache are regarded as invalid but kept so that they
          can't actually be reused by someone else yet, the old buffer manager
          has suffered from starting these scans too frequently (see
          DERBY-704).

          The new buffer manager should try to avoid this problem by only
          starting a scan for invalid objects if it knows that there are items
          that have been invalidated by removal.

          2) When the clock is rotated and the hand sweeps over an unkept, not
          recently used and dirty object, the global synchronization is dropped
          and the object is cleaned. After cleaning the object, the global
          synchronization is re-obtained, but then some other thread may have
          moved the clock hand past the cleaned object while the first thread
          didn't hold the global synchronization lock. In that case, the first
          thread has cleaned an object but is not allowed to reuse its
          entry. Perhaps it's just a theoretical problem, but this could mean
          that some threads have to write multiple dirty pages to disk before
          they are allowed to insert a new one into the cache.

          Since the new buffer manager uses synchronization with finer
          granularity, it should avoid this problem by keeping the
          synchronization lock while the object is being cleaned. Then it knows
          that the entry can be reused as soon as the object has been cleaned.

          Show
          Knut Anders Hatlen added a comment - Some observations on the replacement algorithm in the current buffer manager: 1) If it is known that there is at least one invalid entry in the cache, the normal clock rotation algorithm (which scans no more than 20% of the cache) is bypassed and a scan which may go through the entire cache is performed instead. (The rationale is that we will avoid growing the cache if we don't have to.) This is probably OK since invalid objects should be rare (normally present only after DROP TABLE or similar). However, because objects that are about to be inserted into the cache are regarded as invalid but kept so that they can't actually be reused by someone else yet, the old buffer manager has suffered from starting these scans too frequently (see DERBY-704 ). The new buffer manager should try to avoid this problem by only starting a scan for invalid objects if it knows that there are items that have been invalidated by removal. 2) When the clock is rotated and the hand sweeps over an unkept, not recently used and dirty object, the global synchronization is dropped and the object is cleaned. After cleaning the object, the global synchronization is re-obtained, but then some other thread may have moved the clock hand past the cleaned object while the first thread didn't hold the global synchronization lock. In that case, the first thread has cleaned an object but is not allowed to reuse its entry. Perhaps it's just a theoretical problem, but this could mean that some threads have to write multiple dirty pages to disk before they are allowed to insert a new one into the cache. Since the new buffer manager uses synchronization with finer granularity, it should avoid this problem by keeping the synchronization lock while the object is being cleaned. Then it knows that the entry can be reused as soon as the object has been cleaned.
          Hide
          Knut Anders Hatlen added a comment -

          Patch d2911-entry-javadoc.diff extends the class javadoc for CacheEntry with descriptions of its different states. I also changed a note in the javadoc which said that one thread could only hold the lock on one CacheEntry at a time to prevent deadlocks. Since the replacement algorithm probably needs to hold the lock on two entries at a time (the entry that is about to be inserted and the entry that is to be evicted), I loosened the single-lock-per-thread requirement. Now the javadoc says that it is OK to hold the lock on two different CacheEntry objects, as long as the first entry to be locked is in the uninitialized state and the second entry is not in the uninitialized state. By enforcing a strict order for obtaining the locks, concurrent threads won't run into deadlocks with each other.

          Show
          Knut Anders Hatlen added a comment - Patch d2911-entry-javadoc.diff extends the class javadoc for CacheEntry with descriptions of its different states. I also changed a note in the javadoc which said that one thread could only hold the lock on one CacheEntry at a time to prevent deadlocks. Since the replacement algorithm probably needs to hold the lock on two entries at a time (the entry that is about to be inserted and the entry that is to be evicted), I loosened the single-lock-per-thread requirement. Now the javadoc says that it is OK to hold the lock on two different CacheEntry objects, as long as the first entry to be locked is in the uninitialized state and the second entry is not in the uninitialized state. By enforcing a strict order for obtaining the locks, concurrent threads won't run into deadlocks with each other.
          Hide
          Knut Anders Hatlen added a comment -

          Committed the javadoc patch with revision 575607.

          Show
          Knut Anders Hatlen added a comment - Committed the javadoc patch with revision 575607.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching a patch (d2911-4.diff) which makes a couple of small changes to ConcurrentCache.java:

          1) Make findFreeCacheable() take the CacheEntry as a parameter call setCacheable() on the entry instead of returning a Cacheable. When the replacement algorithm is implemented, findFreeCacheable() is going to need the CacheEntry reference in order to insert the entry into the replacement policy's internal data structure (e.g., clock).

          2) Make removeEntry() take a key (object identity) instead of a CacheEntry so that it also can be used to remove entries from the cache when setIdentity/createIdentity failed.

          Show
          Knut Anders Hatlen added a comment - Attaching a patch (d2911-4.diff) which makes a couple of small changes to ConcurrentCache.java: 1) Make findFreeCacheable() take the CacheEntry as a parameter call setCacheable() on the entry instead of returning a Cacheable. When the replacement algorithm is implemented, findFreeCacheable() is going to need the CacheEntry reference in order to insert the entry into the replacement policy's internal data structure (e.g., clock). 2) Make removeEntry() take a key (object identity) instead of a CacheEntry so that it also can be used to remove entries from the cache when setIdentity/createIdentity failed.
          Hide
          Knut Anders Hatlen added a comment -

          Reattaching with "grant license..."

          Show
          Knut Anders Hatlen added a comment - Reattaching with "grant license..."
          Hide
          Knut Anders Hatlen added a comment -

          Committed d2911-4 with revision 577224.

          Show
          Knut Anders Hatlen added a comment - Committed d2911-4 with revision 577224.
          Hide
          Knut Anders Hatlen added a comment -

          Some thoughts about replacement algorithm and synchronization:

          In the old cache manager, one basically had a global synchronization lock that every user of the cache needed to obtain before entering the manager. One advantage of that approach is that there can't be deadlocks as long as there's only one lock. With the more fine-grained synchronization in the new buffer manager, one need to be careful not to introduce the risk of deadlocks.

          The current implementation of ConcurrentCache should not be deadlock prone, since each thread never has locked more than one entry in the cache. When the replacement algorithm is implemented it will be more complex, as there will be other data structures that we need to lock.

          Take for instance insertion of a new item into the cache. Then we at least need to synchronize on (A) the entry to insert, (B) the clock data structure, and (C) the entry to evict from the cache. In a previous comment, I mentioned that it is OK to lock two entries if the first one is about to be inserted and the second one is already in the cache, so locking C while holding the lock on A is OK. But allowing C to be locked while holding the lock on both A and B, would be problematic. If someone calls CacheManager.remove(), we first need to lock the entry that is about to be removed, and then lock the clock data structure to actually remove the entry. This would mean that we obtain the synchronization locks in the order C->B, which could potentially lead to deadlocks if the insertion of entries had the lock order A->B->C.

          To solve this, I think we would have to break up the A->B->C lock chain somehow. What comes to mind, is that we could unlock B before we lock C, and then reobtain the lock on B after we have locked C. That would leave an open window for others to modify the relationship between B and C for a short while, so we would have to revalidate that what we thought we knew is still true.

          So instead of doing this

          lock A (entry to insert)
          -->lock B (clock)
          ---->fetch C (entry to evict) from B
          ---->lock C
          ------>reuse Cacheable from C in A
          ------>remove C from B
          ------>unlock C
          ---->insert A into B
          ---->unlock B
          -->unlock A
          return

          we could do this

          lock A (entry to insert)
          -->lock B (clock)
          ---->fetch C (entry to evict)
          ---->unlock B
          -->lock C
          ---->validate that no one else evicted C after we unlocked B (otherwise retry the preceding steps)
          ---->reuse Cacheable from C in A
          ---->lock B
          ------>remove C from B
          ------>unlock B
          ---->unlock C
          -->unlock A
          return

          This way we break down the A->B->C locking into the sequences A->B and by A->C->B, none of which conflict with the B->C locking required by CacheFactory.remove().

          Show
          Knut Anders Hatlen added a comment - Some thoughts about replacement algorithm and synchronization: In the old cache manager, one basically had a global synchronization lock that every user of the cache needed to obtain before entering the manager. One advantage of that approach is that there can't be deadlocks as long as there's only one lock. With the more fine-grained synchronization in the new buffer manager, one need to be careful not to introduce the risk of deadlocks. The current implementation of ConcurrentCache should not be deadlock prone, since each thread never has locked more than one entry in the cache. When the replacement algorithm is implemented it will be more complex, as there will be other data structures that we need to lock. Take for instance insertion of a new item into the cache. Then we at least need to synchronize on (A) the entry to insert, (B) the clock data structure, and (C) the entry to evict from the cache. In a previous comment, I mentioned that it is OK to lock two entries if the first one is about to be inserted and the second one is already in the cache, so locking C while holding the lock on A is OK. But allowing C to be locked while holding the lock on both A and B, would be problematic. If someone calls CacheManager.remove(), we first need to lock the entry that is about to be removed, and then lock the clock data structure to actually remove the entry. This would mean that we obtain the synchronization locks in the order C->B, which could potentially lead to deadlocks if the insertion of entries had the lock order A->B->C. To solve this, I think we would have to break up the A->B->C lock chain somehow. What comes to mind, is that we could unlock B before we lock C, and then reobtain the lock on B after we have locked C. That would leave an open window for others to modify the relationship between B and C for a short while, so we would have to revalidate that what we thought we knew is still true. So instead of doing this lock A (entry to insert) -->lock B (clock) ---->fetch C (entry to evict) from B ---->lock C ------>reuse Cacheable from C in A ------>remove C from B ------>unlock C ---->insert A into B ---->unlock B -->unlock A return we could do this lock A (entry to insert) -->lock B (clock) ---->fetch C (entry to evict) ---->unlock B -->lock C ---->validate that no one else evicted C after we unlocked B (otherwise retry the preceding steps) ---->reuse Cacheable from C in A ---->lock B ------>remove C from B ------>unlock B ---->unlock C -->unlock A return This way we break down the A->B->C locking into the sequences A->B and by A->C->B, none of which conflict with the B->C locking required by CacheFactory.remove().
          Hide
          Dag H. Wanvik added a comment -

          Makes sense to me. The last sentence should probably read:

          "This way we break down the A->B->C locking into the sequences A->B and by A->C->B, none of which conflict with the C->B locking required by CacheFactory.remove()."
          i.e.
          (C->B instead of B->C) ?

          Show
          Dag H. Wanvik added a comment - Makes sense to me. The last sentence should probably read: "This way we break down the A->B->C locking into the sequences A->B and by A->C->B, none of which conflict with the C->B locking required by CacheFactory.remove()." i.e. (C->B instead of B->C) ?
          Hide
          Knut Anders Hatlen added a comment -

          Thanks Dag! You're quite right, it should have said C->B.

          Show
          Knut Anders Hatlen added a comment - Thanks Dag! You're quite right, it should have said C->B.
          Hide
          Knut Anders Hatlen added a comment -

          I manually enabled the new buffer manager in modules.properties and ran suites.All + derbyall. The number of test cases failing in suites.All is now 7, of which 6 are expected failures in StatementPlanCacheTest mentioned in a previous comment. The last JUnit failure is in ImportExportTest:client:

          1) ImportExportTest:clientjava.sql.SQLException: Limitation: Record cannot be updated or inserted due to lack of space on the page. Use the parameters derby.storage.pageSize and/or derby.storage.pageReservedSpace to work around this limitation.

          In derbyall, 8 tests failed:

          derbyall/derbyall.fail:unit/cacheService.unit
          derbyall/storeall/storeall.fail:store/rollForwardRecovery.sql
          derbyall/storeall/storeall.fail:store/backupRestore1.java
          derbyall/storeall/storeall.fail:store/OnlineBackupTest1.java
          derbyall/storeall/storeall.fail:unit/T_RawStoreFactory.unit
          derbyall/storeall/storeall.fail:unit/T_b2i.unit
          derbyall/storeall/storeall.fail:unit/recoveryTest.unit
          derbyall/storeall/storeall.fail:store/RecoveryAfterBackup.java

          All the failures in derbyall seem to happen because entries are left in the cache in an invalid state if Cacheable.setIdentity() or Cacheable.createIdentity() throws a StandardException. I will fix this by removing the entries whose identity cannot be set/created.

          Show
          Knut Anders Hatlen added a comment - I manually enabled the new buffer manager in modules.properties and ran suites.All + derbyall. The number of test cases failing in suites.All is now 7, of which 6 are expected failures in StatementPlanCacheTest mentioned in a previous comment. The last JUnit failure is in ImportExportTest:client: 1) ImportExportTest:clientjava.sql.SQLException: Limitation: Record cannot be updated or inserted due to lack of space on the page. Use the parameters derby.storage.pageSize and/or derby.storage.pageReservedSpace to work around this limitation. In derbyall, 8 tests failed: derbyall/derbyall.fail:unit/cacheService.unit derbyall/storeall/storeall.fail:store/rollForwardRecovery.sql derbyall/storeall/storeall.fail:store/backupRestore1.java derbyall/storeall/storeall.fail:store/OnlineBackupTest1.java derbyall/storeall/storeall.fail:unit/T_RawStoreFactory.unit derbyall/storeall/storeall.fail:unit/T_b2i.unit derbyall/storeall/storeall.fail:unit/recoveryTest.unit derbyall/storeall/storeall.fail:store/RecoveryAfterBackup.java All the failures in derbyall seem to happen because entries are left in the cache in an invalid state if Cacheable.setIdentity() or Cacheable.createIdentity() throws a StandardException. I will fix this by removing the entries whose identity cannot be set/created.
          Hide
          Knut Anders Hatlen added a comment -

          The attached patch (d2911-5.diff) fixes the problem with invalid entries being left in the cache if Cacheable.setIdentity()/createIdentity() fails. Most of the derbyall failures went away. I'm rerunning the tests now to see if this change introduces any new failures.

          Show
          Knut Anders Hatlen added a comment - The attached patch (d2911-5.diff) fixes the problem with invalid entries being left in the cache if Cacheable.setIdentity()/createIdentity() fails. Most of the derbyall failures went away. I'm rerunning the tests now to see if this change introduces any new failures.
          Hide
          Knut Anders Hatlen added a comment -

          Committed d2911-5 with revision 578006.

          When I reran suites.All and derbyall with the new buffer manager enabled after applying this patch, I only saw the expected failures in suites.All and two failures in derbyall.

          To find out which of the failures that are to be expected when the buffer manager has no replacement algorithm, I ran the regression tests with the old buffer manager modified so that the replacement algorithm wasn't used (that is, I hard coded the Clock.maximumSize to Integer.MAX_VALUE). Then I saw the same failures in suites.All, but only one failure in derbyall. That leaves us with one "unexplained" error: unit/T_RawStoreFactory.unit which runs out of time.

          Show
          Knut Anders Hatlen added a comment - Committed d2911-5 with revision 578006. When I reran suites.All and derbyall with the new buffer manager enabled after applying this patch, I only saw the expected failures in suites.All and two failures in derbyall. To find out which of the failures that are to be expected when the buffer manager has no replacement algorithm, I ran the regression tests with the old buffer manager modified so that the replacement algorithm wasn't used (that is, I hard coded the Clock.maximumSize to Integer.MAX_VALUE). Then I saw the same failures in suites.All, but only one failure in derbyall. That leaves us with one "unexplained" error: unit/T_RawStoreFactory.unit which runs out of time.
          Hide
          Knut Anders Hatlen added a comment -

          It seems like the failure in unit/T_RawStoreFactory.unit is to be expected too. Hard coding Clock.maximumSize to Integer.MAX_VALUE didn't disable all parts of the replacement algorithm. Clock.findFreeItem() would still reuse invalid entries instead of allocating new ones. When I also disabled the reuse of invalid entries, the test failed with Clock as well.

          Show
          Knut Anders Hatlen added a comment - It seems like the failure in unit/T_RawStoreFactory.unit is to be expected too. Hard coding Clock.maximumSize to Integer.MAX_VALUE didn't disable all parts of the replacement algorithm. Clock.findFreeItem() would still reuse invalid entries instead of allocating new ones. When I also disabled the reuse of invalid entries, the test failed with Clock as well.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching a new patch (d2911-6.diff) which implements a replacement algorithm using the interface and synchronization model described in the previous comments. The main part of the patch is the new ClockPolicy class and its rotateClock() method. I have done my best to state all requirements about synchronization and lock order in comments, so I hope it is possible for others to understand the code...

          I manually edited modules.properties to enable the new buffer manager and ran the full regression test suite. suites.All ran successfully, whereas derbyall had one failure. The failure in derbyall was unit/T_RawStoreFactory.unit which has been mentioned before. This is an expected failure until reuse of invalid entries has been enabled for caches whose size is smaller than the maximum size.

          I have not run any performance tests on this last patch, but I will do so. The performance test attached to this issue doesn't test the replacement algorithm since it creates just a small database. I will therefore see if I can run some other tests with different buffer sizes and also test it with update load, for instance using the test client attached to DERBY-1961.

          What's left to do:

          • reuse Cacheable from invalid entries instead of growing the clock when size < maxSize (will fix the failure in unit/T_RawStoreFactory.unit)
          • implement a background cleaner
          • shrink the cache if it exceeds the maximum size
          Show
          Knut Anders Hatlen added a comment - Attaching a new patch (d2911-6.diff) which implements a replacement algorithm using the interface and synchronization model described in the previous comments. The main part of the patch is the new ClockPolicy class and its rotateClock() method. I have done my best to state all requirements about synchronization and lock order in comments, so I hope it is possible for others to understand the code... I manually edited modules.properties to enable the new buffer manager and ran the full regression test suite. suites.All ran successfully, whereas derbyall had one failure. The failure in derbyall was unit/T_RawStoreFactory.unit which has been mentioned before. This is an expected failure until reuse of invalid entries has been enabled for caches whose size is smaller than the maximum size. I have not run any performance tests on this last patch, but I will do so. The performance test attached to this issue doesn't test the replacement algorithm since it creates just a small database. I will therefore see if I can run some other tests with different buffer sizes and also test it with update load, for instance using the test client attached to DERBY-1961 . What's left to do: reuse Cacheable from invalid entries instead of growing the clock when size < maxSize (will fix the failure in unit/T_RawStoreFactory.unit) implement a background cleaner shrink the cache if it exceeds the maximum size
          Hide
          Knut Anders Hatlen added a comment -

          I checked in the v6 patch with revision 580252. The new buffer manager is still disabled by default. To test it out, you need to edit modules.properties and replace ClockFactory with ConcurrentCacheFactory.

          I have run some performance tests with various types of load. The results look promising. I will come back with the details later.

          Show
          Knut Anders Hatlen added a comment - I checked in the v6 patch with revision 580252. The new buffer manager is still disabled by default. To test it out, you need to edit modules.properties and replace ClockFactory with ConcurrentCacheFactory. I have run some performance tests with various types of load. The results look promising. I will come back with the details later.
          Hide
          Knut Anders Hatlen added a comment -

          I mentioned that I ran some performance tests with the new buffer manager after patch #6. Here's a pdf (perftest6.pdf) with the details for those interested in graphs and numbers. I have used the test client attached to this issue and the one attached to DERBY-1961, so there are numbers for single-record select operations, update operations and two-way joins. To test the replacement algorithm, I varied the page cache size from 40 up to 40000.

          The tests were run both on an AMD Opteron with two 2.4 GHz CPUs, and on a Sun Fire T2000, 1.0 GHz with 8 cores/32 hardware threads.

          The test client attached to this issue was used to show the scalability of the buffer manager. The DERBY-1961 client was used primarily to test the replacement algorithm, since that client scales poorly because of latch contention on the root node of the B-tree (all threads work on the same table).

          For the details, see the pdf, but I'm pasting the conclusion here:


          For most of the loads used in these tests, the new buffer manager performs as good as or better than the old buffer manager. There was one exception: single-record update on Opteron with page cache size 40000. That test was not conclusive, though, as the single run which had highest throughput in that configuration was the new buffer manager with 60 concurrent clients. Overall, the new buffer manager seems to increase the performance.

          Note that the new buffer manager does not yet implement a background cleaner, which is supposed to increase the likelihood of finding a clean page to evict and thereby making the page replacement algorithm more effective. It is not clear to me whether the existence of a background cleaner affects the performance of these tests.


          Show
          Knut Anders Hatlen added a comment - I mentioned that I ran some performance tests with the new buffer manager after patch #6. Here's a pdf (perftest6.pdf) with the details for those interested in graphs and numbers. I have used the test client attached to this issue and the one attached to DERBY-1961 , so there are numbers for single-record select operations, update operations and two-way joins. To test the replacement algorithm, I varied the page cache size from 40 up to 40000. The tests were run both on an AMD Opteron with two 2.4 GHz CPUs, and on a Sun Fire T2000, 1.0 GHz with 8 cores/32 hardware threads. The test client attached to this issue was used to show the scalability of the buffer manager. The DERBY-1961 client was used primarily to test the replacement algorithm, since that client scales poorly because of latch contention on the root node of the B-tree (all threads work on the same table). For the details, see the pdf, but I'm pasting the conclusion here: For most of the loads used in these tests, the new buffer manager performs as good as or better than the old buffer manager. There was one exception: single-record update on Opteron with page cache size 40000. That test was not conclusive, though, as the single run which had highest throughput in that configuration was the new buffer manager with 60 concurrent clients. Overall, the new buffer manager seems to increase the performance. Note that the new buffer manager does not yet implement a background cleaner, which is supposed to increase the likelihood of finding a clean page to evict and thereby making the page replacement algorithm more effective. It is not clear to me whether the existence of a background cleaner affects the performance of these tests.
          Hide
          Knut Anders Hatlen added a comment -

          I don't understand why unit/T_RawStoreFactory.unit fails. Or to be more specific, I don't understand the connection between the failure and the buffer manager. The failure is caused by some pages not being freed (that is, marked as free in the alloc page) on a rollback to savepoint in T_RawStoreFactory.P042(). What's strange is that

          • with Clock (old buffer manager) it works fine
          • when the scan for invalid items to reuse in Clock.findFreeItem() is commented out, it fails
          • with unmodified Clock, and all test cases except P042() commented out, it fails

          So it seems like the result of P042 is somehow dependent on the contents of the page cache (or perhaps the container cache?) when the test case starts, which is strange both because I thought the alloc page didn't care whether a page was in the cache or not, and because the test case creates a new container at the beginning so that none of the pages used in the test should be present in the cache anyway.

          Show
          Knut Anders Hatlen added a comment - I don't understand why unit/T_RawStoreFactory.unit fails. Or to be more specific, I don't understand the connection between the failure and the buffer manager. The failure is caused by some pages not being freed (that is, marked as free in the alloc page) on a rollback to savepoint in T_RawStoreFactory.P042(). What's strange is that with Clock (old buffer manager) it works fine when the scan for invalid items to reuse in Clock.findFreeItem() is commented out, it fails with unmodified Clock, and all test cases except P042() commented out, it fails So it seems like the result of P042 is somehow dependent on the contents of the page cache (or perhaps the container cache?) when the test case starts, which is strange both because I thought the alloc page didn't care whether a page was in the cache or not, and because the test case creates a new container at the beginning so that none of the pages used in the test should be present in the cache anyway.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching a new patch (d2911-7.diff) which implements reuse of free holder objects if the maximum size of the cache has not been reached (it only touches one file - ClockPolicy.java). The patch fixes the failure in unit/T_RawStoreFactory.unit. I have also started the full regression test suite.

          The failure in T_RawStoreFactory is still a mystery to me. I ended up with scanning for free items backwards from the end of the clock. If the scan started from the beginning, the test would fail. I suspect that either the test does not test what it's supposed to test, or perhaps there is a bug somewhere, but since I see the same failure if I change the scan direction in the old buffer manager, I'm confident that the buffer manager is not the problem.

          Show
          Knut Anders Hatlen added a comment - Attaching a new patch (d2911-7.diff) which implements reuse of free holder objects if the maximum size of the cache has not been reached (it only touches one file - ClockPolicy.java). The patch fixes the failure in unit/T_RawStoreFactory.unit. I have also started the full regression test suite. The failure in T_RawStoreFactory is still a mystery to me. I ended up with scanning for free items backwards from the end of the clock. If the scan started from the beginning, the test would fail. I suspect that either the test does not test what it's supposed to test, or perhaps there is a bug somewhere, but since I see the same failure if I change the scan direction in the old buffer manager, I'm confident that the buffer manager is not the problem.
          Hide
          Knut Anders Hatlen added a comment -

          Now that the mystery with the failure in T_RawStoreFactory is solved (DERBY-3099), I'll update the d2911-7 patch so that it scans for unused entries from the beginning of the clock.

          Show
          Knut Anders Hatlen added a comment - Now that the mystery with the failure in T_RawStoreFactory is solved ( DERBY-3099 ), I'll update the d2911-7 patch so that it scans for unused entries from the beginning of the clock.
          Hide
          Knut Anders Hatlen added a comment -

          The attached patch 7a replaces patch 7. It implements reuse of free entries when the cache is smaller than its max size. What's changed is that the scan for reusable entries is performed by rotateClock() instead of a separate method. This means that we will also try to reuse free entries when the cache is full and we know there is at least one free entry, even though we didn't find a free or evictable entry after scanning 20% of the cache.

          Derbyall and suites.All both ran cleanly with the new buffer manager enabled in modules.properties.

          Show
          Knut Anders Hatlen added a comment - The attached patch 7a replaces patch 7. It implements reuse of free entries when the cache is smaller than its max size. What's changed is that the scan for reusable entries is performed by rotateClock() instead of a separate method. This means that we will also try to reuse free entries when the cache is full and we know there is at least one free entry, even though we didn't find a free or evictable entry after scanning 20% of the cache. Derbyall and suites.All both ran cleanly with the new buffer manager enabled in modules.properties.
          Hide
          Knut Anders Hatlen added a comment -

          Committed d2911-7a with revision 585060.

          Show
          Knut Anders Hatlen added a comment - Committed d2911-7a with revision 585060.
          Hide
          Knut Anders Hatlen added a comment -

          I have not implemented a background cleaner for the new buffer manager yet. I will try to implement one that resembles the existing one as much as possible. My understanding is that the background cleaner can be invoked either to clean dirty pages in front of the clock hand, or to shrink the cache if it has grown bigger than its max size. The cleaner is invoked by rotateClock() when it detects that a page needs to be clean (but rotateClock() will clean that page itself), or when it detects that the cache is too large. It then scans the cache, starting right in front of the clock hand, until it finds a page to clean. During the scan, it evicts clean and not recently used pages if the cache needs to shrink. The scan stops when at least one of the following conditions are satisfied:

          a) a page has been cleaned
          b) it has seen that 5% of the pages in the cache are clean
          c) it has visited 10% of the pages in the cache

          I'm not sure whether the restriction imposed by condition a, that the background cleaner never cleans more than one page per invocation, is intended or not. Before I started reading the code, I expected the background cleaner to be more aggressive and clean at least a couple of pages before it went back to sleep. In fact, it looks like the intention at some point was that it should requeue itself after writing out the page, as seen by this return statement in Clock.performWork(boolean):

          needService = true;
          return Serviceable.REQUEUE; // return is actually ignored.

          The reason why the return value is ignored, is that BasicDaemon never requeues a request when the Serviceable is a subscriber. This sounds reasonable, since a subscriber would normally be serviced again later anyway. The background cleaner is however subscribed in onDemandOnly mode, so it won't be serviced until it puts a new request in the queue itself.

          Does anyone have any details on how the background cleaner is intended to work?

          Show
          Knut Anders Hatlen added a comment - I have not implemented a background cleaner for the new buffer manager yet. I will try to implement one that resembles the existing one as much as possible. My understanding is that the background cleaner can be invoked either to clean dirty pages in front of the clock hand, or to shrink the cache if it has grown bigger than its max size. The cleaner is invoked by rotateClock() when it detects that a page needs to be clean (but rotateClock() will clean that page itself), or when it detects that the cache is too large. It then scans the cache, starting right in front of the clock hand, until it finds a page to clean. During the scan, it evicts clean and not recently used pages if the cache needs to shrink. The scan stops when at least one of the following conditions are satisfied: a) a page has been cleaned b) it has seen that 5% of the pages in the cache are clean c) it has visited 10% of the pages in the cache I'm not sure whether the restriction imposed by condition a, that the background cleaner never cleans more than one page per invocation, is intended or not. Before I started reading the code, I expected the background cleaner to be more aggressive and clean at least a couple of pages before it went back to sleep. In fact, it looks like the intention at some point was that it should requeue itself after writing out the page, as seen by this return statement in Clock.performWork(boolean): needService = true; return Serviceable.REQUEUE; // return is actually ignored. The reason why the return value is ignored, is that BasicDaemon never requeues a request when the Serviceable is a subscriber. This sounds reasonable, since a subscriber would normally be serviced again later anyway. The background cleaner is however subscribed in onDemandOnly mode, so it won't be serviced until it puts a new request in the queue itself. Does anyone have any details on how the background cleaner is intended to work?
          Hide
          Dyre Tjeldvoll added a comment -

          Hi Knut,

          Your interpretation seems reasonable to me. Have you tried
          lifting restriction a), so that the background cleaner is allowed
          to clean more than one page? Is there a reason not to let the
          background cleaner re-queue itself?

          Show
          Dyre Tjeldvoll added a comment - Hi Knut, Your interpretation seems reasonable to me. Have you tried lifting restriction a), so that the background cleaner is allowed to clean more than one page? Is there a reason not to let the background cleaner re-queue itself?
          Hide
          Knut Anders Hatlen added a comment -

          > Have you tried lifting restriction a), so that the background cleaner is allowed to clean more than one page?

          I ran a small experiment with the test client attached to DERBY-1961 (single-record update transactions), but I couldn't see much difference between the runs which had a background cleaner and those which didn't have one. I also tried to make the background cleaner write 10 pages instead of 1, but that didn't help either. Now, the DERBY-1961 test client uses a relatively small DB (~10MB, I think), and since we don't sync after we write a page, I guess most writes only go to the file system cache and are fairly quick. I could try to rerun the experiment with a database that's much larger than the physical RAM and see if there's a noticeable difference then.

          > Is there a reason not to let the background cleaner re-queue itself?

          Only if the background cleaner slows things down, but then it would be better to disable it alltogether.

          I think I'll just try to make the new background cleaner behave the same way as the old one (or as close as possible) for now. If we find out later that the current behaviour is not optimal, it shouldn't take much work to change it.

          Show
          Knut Anders Hatlen added a comment - > Have you tried lifting restriction a), so that the background cleaner is allowed to clean more than one page? I ran a small experiment with the test client attached to DERBY-1961 (single-record update transactions), but I couldn't see much difference between the runs which had a background cleaner and those which didn't have one. I also tried to make the background cleaner write 10 pages instead of 1, but that didn't help either. Now, the DERBY-1961 test client uses a relatively small DB (~10MB, I think), and since we don't sync after we write a page, I guess most writes only go to the file system cache and are fairly quick. I could try to rerun the experiment with a database that's much larger than the physical RAM and see if there's a noticeable difference then. > Is there a reason not to let the background cleaner re-queue itself? Only if the background cleaner slows things down, but then it would be better to disable it alltogether. I think I'll just try to make the new background cleaner behave the same way as the old one (or as close as possible) for now. If we find out later that the current behaviour is not optimal, it shouldn't take much work to change it.
          Hide
          Knut Anders Hatlen added a comment -

          Now I have tested the buffer manager and background cleaner on a larger
          database. I used the test client from DERBY-1961 and turned up the
          table size so that the table took ~14GB and the index ~2GB. I ran the
          test on a dual-CPU AMD Opteron with 2 GB RAM, two SCSI disks (one for
          log and one for data) with the write cache disabled.

          With single-record select operations (random primary key lookups)
          there was no difference between the two buffer managers, and there was
          no difference between running with or without the background
          writer. Both observations were as expected (there should be no
          difference between the two buffer managers since the performance was
          disk bound and they use the same replacement algorithm, and the
          background cleaner shouldn't have any effect on read-only load). The
          tests were repeated with page cache size 1000 (4MB), 10000 (40MB) and
          100000 (400MB), with from 1 up to 100 concurrent connections.

          Running with single-record update load (random primary key lookup +
          update of string column not part of the primary key), there wasn't any
          observable effect of the background cleaner on that load
          either. However, when comparing the two buffer manager against each
          other, it seemed that Clock consistently had 3-5% higher throughput
          than ConcurrentCache when the number of concurrent connections
          exceeded 5-10. These results were seen both when the page cache size
          was 4MB and when it was 40MB. When it was 400MB, there was no
          observable difference.

          So for some reason, it seems like the old buffer manager works better
          when there's a combination of high eviction rate and many dirty pages
          (and the db working set is so large that I/O operations normally go to
          disk rather than FS buffer). I found that observation a little
          strange, since the performance for this kind of load should be almost
          exclusively dependent on the page replacement algorithm, which is
          supposed to be identical in the two buffer managers. The only thing I
          know is different, is what I mentioned in one of my previous comments:

          > 2) When the clock is rotated and the hand sweeps over an unkept, not
          > recently used and dirty object, the global synchronization is
          > dropped and the object is cleaned. After cleaning the object, the
          > global synchronization is re-obtained, but then some other thread
          > may have moved the clock hand past the cleaned object while the
          > first thread didn't hold the global synchronization lock. In that
          > case, the first thread has cleaned an object but is not allowed to
          > reuse its entry. Perhaps it's just a theoretical problem, but this
          > could mean that some threads have to write multiple dirty pages to
          > disk before they are allowed to insert a new one into the cache.
          >
          > Since the new buffer manager uses synchronization with finer
          > granularity, it should avoid this problem by keeping the
          > synchronization lock while the object is being cleaned. Then it
          > knows that the entry can be reused as soon as the object has been
          > cleaned.

          So it seems we are hitting the behaviour described above. Because the
          writes are likely to go to disk (not only to the file system cache), it
          is very likely that the clock hand is moved past the page being
          written, so that the cleaned page is not evicted, and the thread that
          cleaned it has to keep searching for an evictable page.

          To verify that this was the cause of the difference in performance, I
          changed ConcurrentCache/ClockPolicy so that it would always continue
          the search after it had cleaned a page (that is, the user thread would
          clean the page and wait until it had been cleaned, but it would not
          evict the page). With this change, the performance of ConcurrentCache
          matched Clock in all the above mentioned tests, with no observable
          difference. Intuitively, I would say that the change would make the
          user threads perform unnecessary work (cleaning pages that they
          wouldn't use, at least not until the clock hand has rotated another
          full round). However, it seems that there's some pipe-lining effect or
          something that makes it more efficient.

          I found this observation quite interesting. It's kind of like the user
          threads are working as background cleaners for each other. I'll do
          some more experimenting and see if I can trick the background cleaner
          into doing the same work. I was thinking that instead of forcing the
          user thread to wait for the page to be cleaned and then continue, it
          could delegate the cleaning to the background cleaner since they won't
          use the page anyway and shouldn't have to wait.

          Show
          Knut Anders Hatlen added a comment - Now I have tested the buffer manager and background cleaner on a larger database. I used the test client from DERBY-1961 and turned up the table size so that the table took ~14GB and the index ~2GB. I ran the test on a dual-CPU AMD Opteron with 2 GB RAM, two SCSI disks (one for log and one for data) with the write cache disabled. With single-record select operations (random primary key lookups) there was no difference between the two buffer managers, and there was no difference between running with or without the background writer. Both observations were as expected (there should be no difference between the two buffer managers since the performance was disk bound and they use the same replacement algorithm, and the background cleaner shouldn't have any effect on read-only load). The tests were repeated with page cache size 1000 (4MB), 10000 (40MB) and 100000 (400MB), with from 1 up to 100 concurrent connections. Running with single-record update load (random primary key lookup + update of string column not part of the primary key), there wasn't any observable effect of the background cleaner on that load either. However, when comparing the two buffer manager against each other, it seemed that Clock consistently had 3-5% higher throughput than ConcurrentCache when the number of concurrent connections exceeded 5-10. These results were seen both when the page cache size was 4MB and when it was 40MB. When it was 400MB, there was no observable difference. So for some reason, it seems like the old buffer manager works better when there's a combination of high eviction rate and many dirty pages (and the db working set is so large that I/O operations normally go to disk rather than FS buffer). I found that observation a little strange, since the performance for this kind of load should be almost exclusively dependent on the page replacement algorithm, which is supposed to be identical in the two buffer managers. The only thing I know is different, is what I mentioned in one of my previous comments: > 2) When the clock is rotated and the hand sweeps over an unkept, not > recently used and dirty object, the global synchronization is > dropped and the object is cleaned. After cleaning the object, the > global synchronization is re-obtained, but then some other thread > may have moved the clock hand past the cleaned object while the > first thread didn't hold the global synchronization lock. In that > case, the first thread has cleaned an object but is not allowed to > reuse its entry. Perhaps it's just a theoretical problem, but this > could mean that some threads have to write multiple dirty pages to > disk before they are allowed to insert a new one into the cache. > > Since the new buffer manager uses synchronization with finer > granularity, it should avoid this problem by keeping the > synchronization lock while the object is being cleaned. Then it > knows that the entry can be reused as soon as the object has been > cleaned. So it seems we are hitting the behaviour described above. Because the writes are likely to go to disk (not only to the file system cache), it is very likely that the clock hand is moved past the page being written, so that the cleaned page is not evicted, and the thread that cleaned it has to keep searching for an evictable page. To verify that this was the cause of the difference in performance, I changed ConcurrentCache/ClockPolicy so that it would always continue the search after it had cleaned a page (that is, the user thread would clean the page and wait until it had been cleaned, but it would not evict the page). With this change, the performance of ConcurrentCache matched Clock in all the above mentioned tests, with no observable difference. Intuitively, I would say that the change would make the user threads perform unnecessary work (cleaning pages that they wouldn't use, at least not until the clock hand has rotated another full round). However, it seems that there's some pipe-lining effect or something that makes it more efficient. I found this observation quite interesting. It's kind of like the user threads are working as background cleaners for each other. I'll do some more experimenting and see if I can trick the background cleaner into doing the same work. I was thinking that instead of forcing the user thread to wait for the page to be cleaned and then continue, it could delegate the cleaning to the background cleaner since they won't use the page anyway and shouldn't have to wait.
          Hide
          Knut Anders Hatlen added a comment -

          I have experimented a little more with the mentioned update load on a
          large (~16GB) database. What I tried, was to change the rotateClock()
          method in the new buffer manager so that when it found a dirty and
          not-recently used page, it put the page in a queue which the
          background writer would process, and go to the next page without
          waiting for the previous page to be written. If the queue grew larger
          than 1/10 of the maximum cache size (because the background thread
          couldn't clean the pages fast enough to keep up with the user
          threads), the user thread would however clean the page itself instead
          of putting the page in the queue.

          The attached patch (cleaner.diff) shows this approach. (Note that this
          patch is only for testing. It doesn't actually use Derby's daemon
          thread, but a separate thread that just sits and listens to an
          ArrayBlockingQueue.) I tested it with the update load and page cache
          size 1000, 10000 and 100000, and it seemed to have as good as or
          better throughput than the old buffer manager in all of these
          tests. (Graphs showing the results are contained in the cleaner.tar
          archive.)

          To me this sounds like a good way to use the background thread, as it
          offloads the user threads so that they don't need to wait for
          potentially slow I/O operations. One might of course say that it's not
          ideal, as it cleans pages right behind the clock hand instead of right
          in front of it. However, the pages that are cleaned have not been used
          during the last full round on the clock, so it is less likely that
          they will be used on the next round as well, and therefore they'll
          probably stay eligible for eviction. Also, if the background cleaner
          works on pages right in front of the hand, it is quite likely that if
          it finds a page to write, the user threads will catch up with it and
          pass it while it's performing the I/O operation, so that it
          effectively ends up cleaning behind the clock hand anyway.

          If there are no objections or other good ideas, I will try to use the
          approach in cleaner.diff and integrate it with Derby's own background
          thread.

          Show
          Knut Anders Hatlen added a comment - I have experimented a little more with the mentioned update load on a large (~16GB) database. What I tried, was to change the rotateClock() method in the new buffer manager so that when it found a dirty and not-recently used page, it put the page in a queue which the background writer would process, and go to the next page without waiting for the previous page to be written. If the queue grew larger than 1/10 of the maximum cache size (because the background thread couldn't clean the pages fast enough to keep up with the user threads), the user thread would however clean the page itself instead of putting the page in the queue. The attached patch (cleaner.diff) shows this approach. (Note that this patch is only for testing. It doesn't actually use Derby's daemon thread, but a separate thread that just sits and listens to an ArrayBlockingQueue.) I tested it with the update load and page cache size 1000, 10000 and 100000, and it seemed to have as good as or better throughput than the old buffer manager in all of these tests. (Graphs showing the results are contained in the cleaner.tar archive.) To me this sounds like a good way to use the background thread, as it offloads the user threads so that they don't need to wait for potentially slow I/O operations. One might of course say that it's not ideal, as it cleans pages right behind the clock hand instead of right in front of it. However, the pages that are cleaned have not been used during the last full round on the clock, so it is less likely that they will be used on the next round as well, and therefore they'll probably stay eligible for eviction. Also, if the background cleaner works on pages right in front of the hand, it is quite likely that if it finds a page to write, the user threads will catch up with it and pass it while it's performing the I/O operation, so that it effectively ends up cleaning behind the clock hand anyway. If there are no objections or other good ideas, I will try to use the approach in cleaner.diff and integrate it with Derby's own background thread.
          Hide
          Jørgen Løland added a comment -

          Hi Knut,

          Thanks for providing these nice graphs; they make it much easier to understand the effects of the new buffer manager. The results are dramatic in many of the illustrated cases!

          Intuitively, I would say that your plan of using a more aggressive background cleaner sounds good since user threads are not using the pages they have cleaned (although I don't understand why not using that page is better). The graphs also clearly indicate that this approach is better than letting the user thread do the job.

          One question: what happens if the clock rotates too fast and reaches a synchronized (due to I/O) page? Will it have to wait for the synchronization to be lifted, or can the clock jump to the next page somehow? Since the background cleaner is used to write most of the pages, it (the background cleaner) could potentially be far behind the clock if I understand this correctly. If the clock is unnecessarily blocked by I/O it's game over

          On a side note - why are performance tests in Derby performed with back-to-back transactions rather than using a more realistic (i.e., Poisson distributed) load? Poisson is normally used to simulate the transaction arrival rate from many independent clients [1]. In your case, I think the background cleaner could perform even better with this (more realistic) workload since it would be scheduled in the short periods with slightly fewer active transactions.

          [1] http://en.wikipedia.org/wiki/Poisson_process

          Show
          Jørgen Løland added a comment - Hi Knut, Thanks for providing these nice graphs; they make it much easier to understand the effects of the new buffer manager. The results are dramatic in many of the illustrated cases! Intuitively, I would say that your plan of using a more aggressive background cleaner sounds good since user threads are not using the pages they have cleaned (although I don't understand why not using that page is better). The graphs also clearly indicate that this approach is better than letting the user thread do the job. One question: what happens if the clock rotates too fast and reaches a synchronized (due to I/O) page? Will it have to wait for the synchronization to be lifted, or can the clock jump to the next page somehow? Since the background cleaner is used to write most of the pages, it (the background cleaner) could potentially be far behind the clock if I understand this correctly. If the clock is unnecessarily blocked by I/O it's game over On a side note - why are performance tests in Derby performed with back-to-back transactions rather than using a more realistic (i.e., Poisson distributed) load? Poisson is normally used to simulate the transaction arrival rate from many independent clients [1] . In your case, I think the background cleaner could perform even better with this (more realistic) workload since it would be scheduled in the short periods with slightly fewer active transactions. [1] http://en.wikipedia.org/wiki/Poisson_process
          Hide
          Knut Anders Hatlen added a comment -

          Hi Jørgen,

          Thanks for taking the time to comment on this issue.

          > One question: what happens if the clock rotates too fast and reaches a synchronized (due to I/O) page? Will it have to wait for the synchronization to be lifted, or can the clock jump to the next page somehow?

          It will have to wait, and that may quite likely have happened in the runs with a small page cache. It would indeed be interesting to see how the performance is affected if we allowed the clock to jump to the next page in that case. By the way, that's what the old buffer manager does, but for reasons mentioned in a previous comment (simplicity of implementation and less likelihood of blocking because of finer sync granularity) ConcurrentCache doesn't. Based on the results for update load on large db, I think it's a good idea to run some tests to see if that was a good design decision. It shouldn't be too hard to make that change.

          > why are performance tests in Derby performed with back-to-back transactions rather than using a more realistic (i.e., Poisson distributed) load?

          No particular reason, I think. And I think it's a very good idea to try with a Poisson distribution or similar, especially to test the background cleaner. If you know any open source test clients that could be used, I'd be more than happy to try them out. Otherwise, I may see if I can write one myself.

          Show
          Knut Anders Hatlen added a comment - Hi Jørgen, Thanks for taking the time to comment on this issue. > One question: what happens if the clock rotates too fast and reaches a synchronized (due to I/O) page? Will it have to wait for the synchronization to be lifted, or can the clock jump to the next page somehow? It will have to wait, and that may quite likely have happened in the runs with a small page cache. It would indeed be interesting to see how the performance is affected if we allowed the clock to jump to the next page in that case. By the way, that's what the old buffer manager does, but for reasons mentioned in a previous comment (simplicity of implementation and less likelihood of blocking because of finer sync granularity) ConcurrentCache doesn't. Based on the results for update load on large db, I think it's a good idea to run some tests to see if that was a good design decision. It shouldn't be too hard to make that change. > why are performance tests in Derby performed with back-to-back transactions rather than using a more realistic (i.e., Poisson distributed) load? No particular reason, I think. And I think it's a very good idea to try with a Poisson distribution or similar, especially to test the background cleaner. If you know any open source test clients that could be used, I'd be more than happy to try them out. Otherwise, I may see if I can write one myself.
          Hide
          Jørgen Løland added a comment -
          Show
          Jørgen Løland added a comment - Could this be used? http://commons.apache.org/sandbox/performance/
          Hide
          Knut Anders Hatlen added a comment -

          > One question: what happens if the clock rotates too fast and reaches
          > a synchronized (due to I/O) page? Will it have to wait for the
          > synchronization to be lifted, or can the clock jump to the next page
          > somehow? Since the background cleaner is used to write most of the
          > pages, it (the background cleaner) could potentially be far behind
          > the clock if I understand this correctly. If the clock is
          > unnecessarily blocked by I/O it's game over

          I made a change so that instead of cleaning the page while holding the
          synchronization on the cache entry, the keep count was incremented (to
          ensure that the page wouldn't get evicted) and the synchronization was
          dropped. This didn't change anything, though. And thinking more about
          it, that makes sense, since if the clock rotates too fast, there won't
          be evictable pages and it's game over anyway...

          That said, I think it's a good idea to drop the synchronization while
          cleaning since it will increase the concurrency (although not by much,
          since the synchronization is much more fine-grained than in Clock). I
          think I'll keep it as it is for now and do a cleanup of it later.

          Another potential concurrency improvement for the replacement
          algorithm is to protect the clock with a
          java.util.concurrent.locks.ReadWriteLock instead of synchronization on
          the clock. Then we don't have to lock the clock structure exclusively
          unless the clock needs to grow or shrink.

          Show
          Knut Anders Hatlen added a comment - > One question: what happens if the clock rotates too fast and reaches > a synchronized (due to I/O) page? Will it have to wait for the > synchronization to be lifted, or can the clock jump to the next page > somehow? Since the background cleaner is used to write most of the > pages, it (the background cleaner) could potentially be far behind > the clock if I understand this correctly. If the clock is > unnecessarily blocked by I/O it's game over I made a change so that instead of cleaning the page while holding the synchronization on the cache entry, the keep count was incremented (to ensure that the page wouldn't get evicted) and the synchronization was dropped. This didn't change anything, though. And thinking more about it, that makes sense, since if the clock rotates too fast, there won't be evictable pages and it's game over anyway... That said, I think it's a good idea to drop the synchronization while cleaning since it will increase the concurrency (although not by much, since the synchronization is much more fine-grained than in Clock). I think I'll keep it as it is for now and do a cleanup of it later. Another potential concurrency improvement for the replacement algorithm is to protect the clock with a java.util.concurrent.locks.ReadWriteLock instead of synchronization on the clock. Then we don't have to lock the clock structure exclusively unless the clock needs to grow or shrink.
          Hide
          Knut Anders Hatlen added a comment -

          > Intuitively, I would say that your plan of using a more aggressive
          > background cleaner sounds good since user threads are not using the
          > pages they have cleaned (although I don't understand why not using
          > that page is better).

          I also have a hard time understanding why not using the page after
          writing it out is better, but I think it may be a result of a
          double-buffering effect together with the file system cache. When a
          page is not necessarily evicted after it has been cleaned (the
          behaviour of Clock in trunk), a user thread may clean multiple pages
          before it is able to evict a page. That could lead to more clustered
          writes and, since the writes go via the fs cache and not directly to
          the disk, it might increase the probability of the fs cache being able
          to clean more pages in one single disk operation.

          To test this, I reran the tests with direct I/O enabled on the data
          disk (which disables the fs cache). I still saw that all approaches
          which frequently ended up not evicting the pages they had cleaned had
          about the same performance as Clock in trunk. What had changed was
          that evicting the pages that had been cleaned, seemed to be slightly
          better than not evicting them.

          Since not evicting the pages is what most closely resembles the
          behaviour of the current buffer manager, and it sounds more likely
          that users have the fs cache enabled than disabled, I think I'll go
          for that solution. Also, the more aggressive background cleaner seems
          to perform better then the old one, so I'll try to come up with a
          patch which combines those two approaches.

          Show
          Knut Anders Hatlen added a comment - > Intuitively, I would say that your plan of using a more aggressive > background cleaner sounds good since user threads are not using the > pages they have cleaned (although I don't understand why not using > that page is better). I also have a hard time understanding why not using the page after writing it out is better, but I think it may be a result of a double-buffering effect together with the file system cache. When a page is not necessarily evicted after it has been cleaned (the behaviour of Clock in trunk), a user thread may clean multiple pages before it is able to evict a page. That could lead to more clustered writes and, since the writes go via the fs cache and not directly to the disk, it might increase the probability of the fs cache being able to clean more pages in one single disk operation. To test this, I reran the tests with direct I/O enabled on the data disk (which disables the fs cache). I still saw that all approaches which frequently ended up not evicting the pages they had cleaned had about the same performance as Clock in trunk. What had changed was that evicting the pages that had been cleaned, seemed to be slightly better than not evicting them. Since not evicting the pages is what most closely resembles the behaviour of the current buffer manager, and it sounds more likely that users have the fs cache enabled than disabled, I think I'll go for that solution. Also, the more aggressive background cleaner seems to perform better then the old one, so I'll try to come up with a patch which combines those two approaches.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching a new patch (derby-2911-8.diff) which implements the
          background cleaner as discussed earlier.

          The patch adds a class called BackgroundCleaner. This class implements
          the Serviceable interface and subscribes to the cache manager's daemon
          service. When a user thread finds an object that needs to be cleaned,
          it gives it to BackgroundCleaner, which cleans the object in the
          daemon service's thread instead of the user thread. BackgroundCleaner
          keeps the objects it needs to clean in a queue with a maximum size of
          1/10 of the cache size. (There is no particular reason for choosing
          exactly 1/10, it just sounded not too big and not too small. Anyway,
          it's basically just an array pointing to objects already in the cache,
          so it's memory footprint will be small compared to the rest of the
          cache.) If the queue is full when the background cleaner receives a
          request from a user thread, the user thread will have to clean the
          object itself.

          I also incorporated Jørgen's suggestion about releasing
          synchronization lock on the entry while it is being cleaned. This also
          led to a small change in how the checkpoint would write the
          pages. Basically, when the checkpoint or the replacement algorithm
          cleans an object, it will increase the entry's keep count (without
          marking it as recently used) to prevent eviction, and release the
          synchronization lock on the entry. When the sync lock has been
          released, the object is written, and finally the sync lock is
          re-obtained so that the keep count can be decremented.

          The javadoc comment for Clock.useDaemonService() says the following
          about multi-threading: "MT - synchronization provided by caller". The
          interface (CacheManager) does not say anything about this, but I added
          a similar comment in the interface javadoc, and used that guarantee to
          avoid the need for synchronization when accessing the reference
          assigned in ConcurrentCache.useDaemonService(). (Currently, this
          method is only used when the database is booted, so I believe the
          comment in Clock is correct.)

          Derbyall and suites.All ran cleanly on Solaris 10 and Sun JDK 6 when I
          manually enabled the new buffer manager in modules.properties. As for
          the previous patches, this patch does not affect any code used by
          default in Derby.

          Show
          Knut Anders Hatlen added a comment - Attaching a new patch (derby-2911-8.diff) which implements the background cleaner as discussed earlier. The patch adds a class called BackgroundCleaner. This class implements the Serviceable interface and subscribes to the cache manager's daemon service. When a user thread finds an object that needs to be cleaned, it gives it to BackgroundCleaner, which cleans the object in the daemon service's thread instead of the user thread. BackgroundCleaner keeps the objects it needs to clean in a queue with a maximum size of 1/10 of the cache size. (There is no particular reason for choosing exactly 1/10, it just sounded not too big and not too small. Anyway, it's basically just an array pointing to objects already in the cache, so it's memory footprint will be small compared to the rest of the cache.) If the queue is full when the background cleaner receives a request from a user thread, the user thread will have to clean the object itself. I also incorporated Jørgen's suggestion about releasing synchronization lock on the entry while it is being cleaned. This also led to a small change in how the checkpoint would write the pages. Basically, when the checkpoint or the replacement algorithm cleans an object, it will increase the entry's keep count (without marking it as recently used) to prevent eviction, and release the synchronization lock on the entry. When the sync lock has been released, the object is written, and finally the sync lock is re-obtained so that the keep count can be decremented. The javadoc comment for Clock.useDaemonService() says the following about multi-threading: "MT - synchronization provided by caller". The interface (CacheManager) does not say anything about this, but I added a similar comment in the interface javadoc, and used that guarantee to avoid the need for synchronization when accessing the reference assigned in ConcurrentCache.useDaemonService(). (Currently, this method is only used when the database is booted, so I believe the comment in Clock is correct.) Derbyall and suites.All ran cleanly on Solaris 10 and Sun JDK 6 when I manually enabled the new buffer manager in modules.properties. As for the previous patches, this patch does not affect any code used by default in Derby.
          Hide
          Knut Anders Hatlen added a comment -

          > Could this be used?
          > http://commons.apache.org/sandbox/performance/

          Thanks Jørgen, that looks very useful. It turned out that it didn't take too many code lines to make the existing test generate Poisson distributed load, so I wrote a small test client myself. I'll come back with the code and the numbers later.

          Show
          Knut Anders Hatlen added a comment - > Could this be used? > http://commons.apache.org/sandbox/performance/ Thanks Jørgen, that looks very useful. It turned out that it didn't take too many code lines to make the existing test generate Poisson distributed load, so I wrote a small test client myself. I'll come back with the code and the numbers later.
          Hide
          Knut Anders Hatlen added a comment -

          poisson_patch8.tar contains the code I used to run Poisson distributed single-record updates on a 16GB database. It also contains graphs showing the average response times with derby.storage.pageCacheSize 1000 and 10000, comparing trunk and ConcurrentCache with patch #8. As predicted by Jørgen, this test shows more clearly the advantages of the new background cleaner. It seems clear that the old buffer manager collapse (that is, its response times grow quickly) at a significantly lower number of requests per second with this load.

          Show
          Knut Anders Hatlen added a comment - poisson_patch8.tar contains the code I used to run Poisson distributed single-record updates on a 16GB database. It also contains graphs showing the average response times with derby.storage.pageCacheSize 1000 and 10000, comparing trunk and ConcurrentCache with patch #8. As predicted by Jørgen, this test shows more clearly the advantages of the new background cleaner. It seems clear that the old buffer manager collapse (that is, its response times grow quickly) at a significantly lower number of requests per second with this load.
          Hide
          Knut Anders Hatlen added a comment -

          Committed patch #8 with revision 596265.

          Now the only thing that is not implemented, is shrinking the clock when it has exceeded its maximum size (this happens if no evictable items are found by rotateClock()). The old buffer manager would delegate the shrinking to its daemon service from rotateClock() if it had a daemon service. If it didn't have a daemon service, it would do the work itself from release(). I think we should try the same approach for ConcurrentCache.

          Show
          Knut Anders Hatlen added a comment - Committed patch #8 with revision 596265. Now the only thing that is not implemented, is shrinking the clock when it has exceeded its maximum size (this happens if no evictable items are found by rotateClock()). The old buffer manager would delegate the shrinking to its daemon service from rotateClock() if it had a daemon service. If it didn't have a daemon service, it would do the work itself from release(). I think we should try the same approach for ConcurrentCache.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching a new patch (d2911-9) which implements shrinking of the
          clock when it has exceeded the maximum size. The shrinking is
          performed in the background cleaner if the cache has one, and in the
          user thread otherwise.

          Also, Clock has a method called trimToSize() which tries to remove as
          many invalid entries as possible (even if the cache is smaller than
          the maximum size). It is called from ageOut(), discard() and
          performWork(), but it doesn't do anything unless all of these
          conditions are satisfied:

          1) the method must have been called at least as many times as 1/8 of
          the number of elements in the cache

          2) the cache must not contain less than 32 elements

          3) more than 25% of the elements in the cache must be invalid

          This doesn't happen very often (according to the code coverage report at
          http://people.apache.org/~fuzzylogic/codecoverage/529822/_files/21f.html#25
          it doesn't happen at all in our test runs), so I'm not sure it adds
          enough value to justify the extra complexity. However, to keep things
          as close to the old implementation as possible, I added such a method
          to the new buffer manager too. There is one small difference, though:
          The old buffer manager would first move all invalid elements to the
          end of the ArrayList, before the elements were removed from the
          list. While it is possibly more efficient, this operation is performed
          so infrequently that it's hardly worth the effort to optimize
          it. Also, the old buffer manager would lock the entire cache while
          doing this, so it knew that the elements were still invalid after it
          had moved them to the end of the list. The new buffer manager only
          locks one element at a time, so it doesn't know that the elements will
          stay invalid after they have been moved to the end of the
          list. Therefore, I chose to implement the method as a simple scan
          which removed invalid entries as it found them.

          I successfully ran derbyall and suites.All with the new buffer manager
          enabled in modules.properties.

          Show
          Knut Anders Hatlen added a comment - Attaching a new patch (d2911-9) which implements shrinking of the clock when it has exceeded the maximum size. The shrinking is performed in the background cleaner if the cache has one, and in the user thread otherwise. Also, Clock has a method called trimToSize() which tries to remove as many invalid entries as possible (even if the cache is smaller than the maximum size). It is called from ageOut(), discard() and performWork(), but it doesn't do anything unless all of these conditions are satisfied: 1) the method must have been called at least as many times as 1/8 of the number of elements in the cache 2) the cache must not contain less than 32 elements 3) more than 25% of the elements in the cache must be invalid This doesn't happen very often (according to the code coverage report at http://people.apache.org/~fuzzylogic/codecoverage/529822/_files/21f.html#25 it doesn't happen at all in our test runs), so I'm not sure it adds enough value to justify the extra complexity. However, to keep things as close to the old implementation as possible, I added such a method to the new buffer manager too. There is one small difference, though: The old buffer manager would first move all invalid elements to the end of the ArrayList, before the elements were removed from the list. While it is possibly more efficient, this operation is performed so infrequently that it's hardly worth the effort to optimize it. Also, the old buffer manager would lock the entire cache while doing this, so it knew that the elements were still invalid after it had moved them to the end of the list. The new buffer manager only locks one element at a time, so it doesn't know that the elements will stay invalid after they have been moved to the end of the list. Therefore, I chose to implement the method as a simple scan which removed invalid entries as it found them. I successfully ran derbyall and suites.All with the new buffer manager enabled in modules.properties.
          Hide
          Knut Anders Hatlen added a comment -

          I think there is another issue similar to the one Jørgen commented on
          about holding the lock on an entry while writing it to disk.

          When an entry is fetched into the cache (either with create() or
          find()), the old buffer manager would release the lock on the cache
          before running createIdentity()/setIdentity(). I assumed it released
          the lock because it didn't want to block the entire cache while
          performing what could be time consuming operations. I have now found
          another reason for releasing the lock: When a new container is
          created, the call to createIdentity() will look at data in other
          containers and therefore reenter the container cache. If we hold the
          lock on the entry that is being created, we can possibly run into a
          deadlock when the container cache is reentered. Will try to come up
          with a fix.

          Show
          Knut Anders Hatlen added a comment - I think there is another issue similar to the one Jørgen commented on about holding the lock on an entry while writing it to disk. When an entry is fetched into the cache (either with create() or find()), the old buffer manager would release the lock on the cache before running createIdentity()/setIdentity(). I assumed it released the lock because it didn't want to block the entire cache while performing what could be time consuming operations. I have now found another reason for releasing the lock: When a new container is created, the call to createIdentity() will look at data in other containers and therefore reenter the container cache. If we hold the lock on the entry that is being created, we can possibly run into a deadlock when the container cache is reentered. Will try to come up with a fix.
          Hide
          Knut Anders Hatlen added a comment -

          Committed d2911-9.diff with revision 601680.

          Show
          Knut Anders Hatlen added a comment - Committed d2911-9.diff with revision 601680.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching a new patch (d2911-10). This patch addresses the potential deadlock issue if user code (i.e., Cacheable.setIdentity() or Cacheable.createIdentity()) reenters the cache. It makes the callers of setIdentity()/createIdentity() release the ReentrantLock on the cache entry before calling one of those methods. Other threads that want to access the object in that entry therefore have to wait until the identity has been set, whereas threads scanning the cache (e.g., checkpoint or clock rotation) skip the entry instead of waiting since they just see an invalid/kept entry.

          All the regression tests passed when I enabled ConcurrentCache in modules.properties.

          Show
          Knut Anders Hatlen added a comment - Attaching a new patch (d2911-10). This patch addresses the potential deadlock issue if user code (i.e., Cacheable.setIdentity() or Cacheable.createIdentity()) reenters the cache. It makes the callers of setIdentity()/createIdentity() release the ReentrantLock on the cache entry before calling one of those methods. Other threads that want to access the object in that entry therefore have to wait until the identity has been set, whereas threads scanning the cache (e.g., checkpoint or clock rotation) skip the entry instead of waiting since they just see an invalid/kept entry. All the regression tests passed when I enabled ConcurrentCache in modules.properties.
          Hide
          Knut Anders Hatlen added a comment -

          Committed d2911-10.diff with revision 603491.

          Show
          Knut Anders Hatlen added a comment - Committed d2911-10.diff with revision 603491.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching a new patch which (d2911-11) which makes the new buffer manager use the initialSize parameter in CacheFactory.newCacheManager(). It only uses the parameter to specify the initial capacity of the ConcurrentHashMap in ConcurrentCache and the ArrayList in ClockPolicy. The interface actually says that we should create enough holder objects to hold that number of objects, but that's not what the old buffer manager does. I have logged an issue DERBY-3275 for making the code and the comments match.

          Show
          Knut Anders Hatlen added a comment - Attaching a new patch which (d2911-11) which makes the new buffer manager use the initialSize parameter in CacheFactory.newCacheManager(). It only uses the parameter to specify the initial capacity of the ConcurrentHashMap in ConcurrentCache and the ArrayList in ClockPolicy. The interface actually says that we should create enough holder objects to hold that number of objects, but that's not what the old buffer manager does. I have logged an issue DERBY-3275 for making the code and the comments match.
          Hide
          Knut Anders Hatlen added a comment -

          Committed d2911-11.diff with revision 604153.

          I think that means that all the functionality of the old buffer manager is also implemented in the new buffer manager.

          In an earlier comment, I mentioned that we might see more concurrency when the eviction frequency is high if we replace the synchronization lock on the ArrayList in ClockPolicy with a ReadWriteLock. I made quick patch and ran a couple of tests, but I couldn't see any significant improvement, so I'll just leave it as it is for now. If it later turns out to be a bottleneck, it'll be very simple to change it.

          I plan to rerun the tests I ran earlier on different hardware and with different page cache sizes to see if we still have the performance gains we had then, and also to see if there is some load/configuration where the performance is worse with the new buffer manager. I will post the results once I have them.

          I would appreciate it if someone would review the code. The new buffer manager does not share any code with the old buffer manager, and it is completely contained in these files in the impl/services/cache directory:

          ConcurrentCacheFactory.java - implementation of the CacheFactory interface and contains only a single factory method to create a ConcurrentCache instance

          ConcurrentCache.java - the CacheManager implementation. This class contains the code needed to insert, find, release and remove items in the cache, and it is build around a ConcurrentHashMap

          CacheEntry.java - wrapper/holder object for objects in the cache. This object contains some state for the entry (keep count, validity, etc), and code which enables fine-grained synchronization (ReentrantLock on entry level instead of synchronization on cache level as in the old buffer manager)

          ReplacementPolicy.java/ClockPolicy.java - interface and implementation of the replacement algorithm, that is the algorithm that is used to evict objects from the cache to make room for new objects when the cache is full

          BackgroundCleaner.java - a Serviceable that is used by the cache manager to perform asynchronous I/O or other background tasks

          Show
          Knut Anders Hatlen added a comment - Committed d2911-11.diff with revision 604153. I think that means that all the functionality of the old buffer manager is also implemented in the new buffer manager. In an earlier comment, I mentioned that we might see more concurrency when the eviction frequency is high if we replace the synchronization lock on the ArrayList in ClockPolicy with a ReadWriteLock. I made quick patch and ran a couple of tests, but I couldn't see any significant improvement, so I'll just leave it as it is for now. If it later turns out to be a bottleneck, it'll be very simple to change it. I plan to rerun the tests I ran earlier on different hardware and with different page cache sizes to see if we still have the performance gains we had then, and also to see if there is some load/configuration where the performance is worse with the new buffer manager. I will post the results once I have them. I would appreciate it if someone would review the code. The new buffer manager does not share any code with the old buffer manager, and it is completely contained in these files in the impl/services/cache directory: ConcurrentCacheFactory.java - implementation of the CacheFactory interface and contains only a single factory method to create a ConcurrentCache instance ConcurrentCache.java - the CacheManager implementation. This class contains the code needed to insert, find, release and remove items in the cache, and it is build around a ConcurrentHashMap CacheEntry.java - wrapper/holder object for objects in the cache. This object contains some state for the entry (keep count, validity, etc), and code which enables fine-grained synchronization (ReentrantLock on entry level instead of synchronization on cache level as in the old buffer manager) ReplacementPolicy.java/ClockPolicy.java - interface and implementation of the replacement algorithm, that is the algorithm that is used to evict objects from the cache to make room for new objects when the cache is full BackgroundCleaner.java - a Serviceable that is used by the cache manager to perform asynchronous I/O or other background tasks
          Hide
          Knut Anders Hatlen added a comment -

          The attached patch merges the test clients used previously in this issue and places them under org.apache.derbyTesting.perf.clients. With the class Runner in that package, one can start each of the tests from the command line and configure which type of load generator (back-to-back or Poisson) to use.

          I am currently running tests which compare the performance of ConcurrentCache and Clock on one dual-CPU machine and on one 8-CPU machine. I'll post the results once I have them.

          Show
          Knut Anders Hatlen added a comment - The attached patch merges the test clients used previously in this issue and places them under org.apache.derbyTesting.perf.clients. With the class Runner in that package, one can start each of the tests from the command line and configure which type of load generator (back-to-back or Poisson) to use. I am currently running tests which compare the performance of ConcurrentCache and Clock on one dual-CPU machine and on one 8-CPU machine. I'll post the results once I have them.
          Hide
          Knut Anders Hatlen added a comment -

          I finally got around to look at the results from the tests I had running. Sorry for the delay.

          Attached is a diff (perftest2.diff) with an updated version of the performance tests that I used (the load that were supposed to resemble the d2911perf class attached to this issue, was changed so that it matched d2911perf more closely, and this change also simplified the code a bit), and a pdf (perftest.pdf) which summarizes the findings. Nothing new there, actually, the results were pretty consistent with the results that have been posted here earlier. The new buffer manager performs significantly better than the old one when there's no other contention point (like the log device or the root node in the B-tree) and there are multiple CPUs/cores, and a little better than or equal to the old buffer manager in cases where there is another contention point.

          Show
          Knut Anders Hatlen added a comment - I finally got around to look at the results from the tests I had running. Sorry for the delay. Attached is a diff (perftest2.diff) with an updated version of the performance tests that I used (the load that were supposed to resemble the d2911perf class attached to this issue, was changed so that it matched d2911perf more closely, and this change also simplified the code a bit), and a pdf (perftest.pdf) which summarizes the findings. Nothing new there, actually, the results were pretty consistent with the results that have been posted here earlier. The new buffer manager performs significantly better than the old one when there's no other contention point (like the log device or the root node in the B-tree) and there are multiple CPUs/cores, and a little better than or equal to the old buffer manager in cases where there is another contention point.
          Hide
          Knut Anders Hatlen added a comment -

          Checked in the performance tests with revision 619404.

          Show
          Knut Anders Hatlen added a comment - Checked in the performance tests with revision 619404.
          Hide
          Øystein Grøvlen added a comment -

          I have looked at the code introduced in this JIRA for a new cache
          manager, and think it looks very good. I have only found one small issue
          related to correctness (4e), but I have some minor comments/questions on
          the organization of the code:

          1. ConcurrentCache:

          a. What is the significant difference between removeEntry and
          evictEntry? At first, I thought it was that evictEntry set the
          Cacheable to null, but then I discovered that removeEntry does
          the same through its call to CacheEntry#free. So I guess the
          only difference is that removeEntry will include a call to
          ReplacementPolicy#Callback#free. I am not sure I understand why
          this should be avoided for evictEntry. If both methods are
          needed, the first sentences of their javadoc should be different
          in order to be able to make a distinction when reading the
          javadoc summary.

          b. To me, the organization of find and create methods for entries
          and cacheables are a bit confusing. It seems to me that it would
          be easier to understand if it was based on basic methods for
          get/find and create, that just did what it said it did, and then
          created the more advanced operations like
          findAndCreateIfNotFound on top of that. Is this done for
          efficiency reasons? Especially, the findOrCreateObject method
          is a bit confusing since it uses a parameter to change behavior
          from findAndCreateIfNotFound to createAndFlagErrorIfFound.
          Would it be possible to create two separate methods here? At
          least, I think the javadoc for the create parameter needs to
          describe exactly what will occur.

          c. findOrCreateObject: The create parameter will determine whether
          setIdentify and not createIdentify will be called when an object
          is not found in the cache. Is this difference just an issue of
          whether createParameter has been provided, or is there something
          more that distinguishes these to cases?

          d. findCached/release: Comments states that one "don't need to call
          getEntry()", but I guess the points is that it would be harmful
          to call getEntry?

          2. CacheEntry:

          a. lockWhenIdentityIsSet: I do not feel the name really describes
          what the method is doing. lockAndBlockUntilIdentityIsSet would
          be more descriptive. Maybe you can come up with an even better
          (and shorter) name.

          3. ReplacementPolicy:

          a. The Callback returned by insertEntry seems currently not to be
          in use. In what situations may it be used?

          4. ClockPolicy:

          a. grow: Instead of requiring that the caller synchronize, why not
          synchronize within the method? (Like is done for several other
          methods, eg., moveHand)

          b. rotateClock: The concept of a small cache (<20 entries) is that
          relevant for any currently used caches in Derby? It seems a bit
          strange to check 38 items when there are 19 entries, but only 4
          items if there are 20 entries. But maybe it is not that
          relevant with respect to the cache sizes used in Derby?

          c. rotateClock: I think it would be good if some of the discussion
          in the JIRA about why it is not using the entries it has
          cleaned, would be reflected in comments.

          d. shrinkMe: javadoc for return is incomplete

          e. shrinkMe: I think there is an off-by-one error for the call on
          clock.get. If pos==size-1 when entering the synchronized block,
          one will do get(size) which I guess would give
          ArrayIndexOutOfBoundsException. Also, the first element of clock
          will never be inspected since index will never be 0.

          f. shrinkMe: The code to determine whether an item could be evicted
          is the same as for rotateClock. Could this code be refactored
          into a common method? (isEvictable() or something like that)

          g. trimMe: This method contains a lot of heuristics that I guess
          you have inherited from the old clock implementation. If you
          have got any insight into why the particular values are chosen,
          if would be good if you could comment on that. (I notice that
          the criteria for being a small cache is not the same here as in
          rotateClock.)

          h. Maybe some of the numeric constants used for heuristics in this
          class should be defined as named constants?

          5. BackgroundCleaner:

          a. I am not sure the statement "used by ConcurrentCache so that it
          doesn't have to wait for clean operations to finish" covers the
          purpose. Clean operations on ConcurrentCache does not use the
          background cleaner. It is find operations on ConcurrentCache
          that will invoke the cleaner. In addition, the background
          cleaner does not help a given find operation, it just makes
          future operation more likely to find an item that can be reused.

          b. serviceImmediately: As far as I can tell this method is not
          relevant since it is not used by BasicDaemon. Maybe that should
          be indicated?

          Show
          Øystein Grøvlen added a comment - I have looked at the code introduced in this JIRA for a new cache manager, and think it looks very good. I have only found one small issue related to correctness (4e), but I have some minor comments/questions on the organization of the code: 1. ConcurrentCache: a. What is the significant difference between removeEntry and evictEntry? At first, I thought it was that evictEntry set the Cacheable to null, but then I discovered that removeEntry does the same through its call to CacheEntry#free. So I guess the only difference is that removeEntry will include a call to ReplacementPolicy#Callback#free. I am not sure I understand why this should be avoided for evictEntry. If both methods are needed, the first sentences of their javadoc should be different in order to be able to make a distinction when reading the javadoc summary. b. To me, the organization of find and create methods for entries and cacheables are a bit confusing. It seems to me that it would be easier to understand if it was based on basic methods for get/find and create, that just did what it said it did, and then created the more advanced operations like findAndCreateIfNotFound on top of that. Is this done for efficiency reasons? Especially, the findOrCreateObject method is a bit confusing since it uses a parameter to change behavior from findAndCreateIfNotFound to createAndFlagErrorIfFound. Would it be possible to create two separate methods here? At least, I think the javadoc for the create parameter needs to describe exactly what will occur. c. findOrCreateObject: The create parameter will determine whether setIdentify and not createIdentify will be called when an object is not found in the cache. Is this difference just an issue of whether createParameter has been provided, or is there something more that distinguishes these to cases? d. findCached/release: Comments states that one "don't need to call getEntry()", but I guess the points is that it would be harmful to call getEntry? 2. CacheEntry: a. lockWhenIdentityIsSet: I do not feel the name really describes what the method is doing. lockAndBlockUntilIdentityIsSet would be more descriptive. Maybe you can come up with an even better (and shorter) name. 3. ReplacementPolicy: a. The Callback returned by insertEntry seems currently not to be in use. In what situations may it be used? 4. ClockPolicy: a. grow: Instead of requiring that the caller synchronize, why not synchronize within the method? (Like is done for several other methods, eg., moveHand) b. rotateClock: The concept of a small cache (<20 entries) is that relevant for any currently used caches in Derby? It seems a bit strange to check 38 items when there are 19 entries, but only 4 items if there are 20 entries. But maybe it is not that relevant with respect to the cache sizes used in Derby? c. rotateClock: I think it would be good if some of the discussion in the JIRA about why it is not using the entries it has cleaned, would be reflected in comments. d. shrinkMe: javadoc for return is incomplete e. shrinkMe: I think there is an off-by-one error for the call on clock.get. If pos==size-1 when entering the synchronized block, one will do get(size) which I guess would give ArrayIndexOutOfBoundsException. Also, the first element of clock will never be inspected since index will never be 0. f. shrinkMe: The code to determine whether an item could be evicted is the same as for rotateClock. Could this code be refactored into a common method? (isEvictable() or something like that) g. trimMe: This method contains a lot of heuristics that I guess you have inherited from the old clock implementation. If you have got any insight into why the particular values are chosen, if would be good if you could comment on that. (I notice that the criteria for being a small cache is not the same here as in rotateClock.) h. Maybe some of the numeric constants used for heuristics in this class should be defined as named constants? 5. BackgroundCleaner: a. I am not sure the statement "used by ConcurrentCache so that it doesn't have to wait for clean operations to finish" covers the purpose. Clean operations on ConcurrentCache does not use the background cleaner. It is find operations on ConcurrentCache that will invoke the cleaner. In addition, the background cleaner does not help a given find operation, it just makes future operation more likely to find an item that can be reused. b. serviceImmediately: As far as I can tell this method is not relevant since it is not used by BasicDaemon. Maybe that should be indicated?
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for reviewing the code, Øystein! Those are very good comments and questions. I'll study them more in detail and try to come up with answers over the weekend.

          Show
          Knut Anders Hatlen added a comment - Thanks for reviewing the code, Øystein! Those are very good comments and questions. I'll study them more in detail and try to come up with answers over the weekend.
          Hide
          Knut Anders Hatlen added a comment -

          I've gone through the comments and questions. Please find my
          responses below. Regarding the correctness error (4e), I think the
          code is actually correct, but it could have been clearer.

          The new buffer manager isn't enabled in the build yet. Should the
          issues mentioned by Øystein be resolved first, or would it be OK to
          enable it first and do the reorganization and improvements later?

          Responses to Øystein's questions/comments:

          1. ConcurrentCache

          a) Difference between evictEntry() and removeEntry().

          I think you are right that the only significant difference is that
          removeEntry() calls free() whereas evictEntry() doesn't. Calling
          free() from evictEntry() wouldn't be correct in the current code,
          though. free() is called to notify the replacement policy that the
          Cacheable contained in the entry can be reused. When evictEntry() is
          called, we're either in the process of reusing the Cacheable (in which
          case we don't want to first mark it as reusable so that others can
          grab it, and then try to reobtain it and possibly reuse it), or we're
          shrinking the cache (in which case making it reusable would mean that
          the cache doesn't shrink).

          I'll see if I can find a clever way to merge the two methods, or at
          least improve the comments and perhaps the names of the methods.

          b) Organization of find and create methods.

          I guess it is possible to split them into two or three basic methods
          and build the public methods on top of them. However, I'm not sure
          it's a good idea to remove the create flag. findOrCreateObject() is a
          fairly large method, and the check of the create flag is a very small
          part of it. If the create flag were removed, it would mean that we
          have to duplicate a most of the logic in that method. But if we split
          the logic into more basic methods, it will hopefully be easier to
          understand how the create flag affects each of the basic methods.

          c) findOrCreateObject(): create vs createParameter

          create can be true even if createParameter is null, so I don't think
          create can be skipped. The distinction between setIdentity() and
          createIdentity() comes from the Cacheable interface and can't be
          changed without modifying the existing buffer manager and the store as
          well.

          d) findCached()/release(): get() vs getEntry()

          You're right, it is not harmful to call getEntry() in terms of
          correctness, but it's slightly more expensive. I will update the
          comments to make this clearer.

          2. CacheEntry

          a) Misleading name: lockWhenIdentityIsSet()

          Your suggestion lockAndBlockUntilIdentityIsSet() does match more
          closely the calls in the method (lock() + await()), but since await()
          will release the lock until it wakes up, I think the original name
          describes better how the locking and blocking is perceived from
          outside the method. Perhaps waitUntilIdentityIsSetAndLock() is a
          better name? Of course, better comments in the method would also help.

          3. ReplacementPolicy

          a) Return value from insertEntry() isn't used.

          The return value isn't needed since the replacement policy will link
          the CacheEntry and the Callback internally by calling
          CacheEntry.setCallback(). Will change the method so that it's void.

          4. ClockPolicy

          a) Why not synchronize inside grow()?

          One of the callers needs the synchronization for more than just the
          call to grow() and therefore needs the synchronized block anyway, so I
          felt it was more consistent if all (both) callers used an explicit
          synchronization block. (It also avoided double synchronization for
          that caller.) The other methods you mention are supposed to be atomic,
          and are not intended to be called from within a larger block
          synchronized on the same object.

          b) Handling of small cache sizes in rotateClock().

          I agree that this is a bit strange. The logic is copied from
          Clock.rotateClock(), by the way. I believe you are correct that it's
          not relevant for the (default) cache sizes currently used (none is
          smaller than 32). Perhaps it would be better to simplify it and just
          say itemsToCheck = Math.max(20, ...) or something (the number 20 is
          arbitrary).

          c) Add more comments about why cleaned entries are not reused.

          Will update the comments with more details from the JIRA discussion.

          d) Incomplete @return tag for shrinkMe().

          Thanks, fixed in revision 631225.

          e) Off-by-one error in shrinkMe().

          I don't think there is an off-by-one error, since we use the old value
          of pos, not the one that has been increased by one. I guess it's a bit
          confusing to use the postfix increment operator and store away the old
          value on the same line in the code. Perhaps it would be clearer if the
          code were reordered from

          index = pos++;
          h = clock.get(index);

          to

          h = clock.get(pos);
          index = pos;
          pos++;

          Or perhaps I misunderstood what you meant?

          f) Factor out common code for evicting entries in rotateClock() and
          shrinkMe().

          Will do that.

          g) Comment on the heuristics in trimMe().

          As you quite correctly guessed, the heuristics have been taken from
          the old clock implementation (Clock.trimToSize()), and I don't know
          why they were chosen.

          What strikes me with the heuristics, is that they seem to do their
          best to prevent this method from ever doing anything (which can also
          be seen in the code coverage reports). Also, I don't see very much
          value in shrinking a cache that already is smaller than the max
          size. If someone needs it, the correct way would be to reduce the max
          size of the cache, I think. Perhaps it would be better just to remove
          it? Seems like it's much complexity for little added value.

          h) Naming constants used in heuristic

          Probably a good idea. Will do that.

          5. BackgroundCleaner

          a) Misleading class javadoc

          Will fix that comment.

          b) Comment that serviceImmediately() isn't used by BasicDaemon

          Could do that. Now, BackgroundCleaner doesn't know that it's being
          serviced by a BasicDaemon, so I'm not sure whether or not it's
          BackgroundCleaner's concern that the method is not being used by
          BasicDaemon. If another implementation of DaemonService is used
          (that's not controlled by the cache implementation), the returned
          value may become relevant.

          Show
          Knut Anders Hatlen added a comment - I've gone through the comments and questions. Please find my responses below. Regarding the correctness error (4e), I think the code is actually correct, but it could have been clearer. The new buffer manager isn't enabled in the build yet. Should the issues mentioned by Øystein be resolved first, or would it be OK to enable it first and do the reorganization and improvements later? Responses to Øystein's questions/comments: 1. ConcurrentCache a) Difference between evictEntry() and removeEntry(). I think you are right that the only significant difference is that removeEntry() calls free() whereas evictEntry() doesn't. Calling free() from evictEntry() wouldn't be correct in the current code, though. free() is called to notify the replacement policy that the Cacheable contained in the entry can be reused. When evictEntry() is called, we're either in the process of reusing the Cacheable (in which case we don't want to first mark it as reusable so that others can grab it, and then try to reobtain it and possibly reuse it), or we're shrinking the cache (in which case making it reusable would mean that the cache doesn't shrink). I'll see if I can find a clever way to merge the two methods, or at least improve the comments and perhaps the names of the methods. b) Organization of find and create methods. I guess it is possible to split them into two or three basic methods and build the public methods on top of them. However, I'm not sure it's a good idea to remove the create flag. findOrCreateObject() is a fairly large method, and the check of the create flag is a very small part of it. If the create flag were removed, it would mean that we have to duplicate a most of the logic in that method. But if we split the logic into more basic methods, it will hopefully be easier to understand how the create flag affects each of the basic methods. c) findOrCreateObject(): create vs createParameter create can be true even if createParameter is null, so I don't think create can be skipped. The distinction between setIdentity() and createIdentity() comes from the Cacheable interface and can't be changed without modifying the existing buffer manager and the store as well. d) findCached()/release(): get() vs getEntry() You're right, it is not harmful to call getEntry() in terms of correctness, but it's slightly more expensive. I will update the comments to make this clearer. 2. CacheEntry a) Misleading name: lockWhenIdentityIsSet() Your suggestion lockAndBlockUntilIdentityIsSet() does match more closely the calls in the method (lock() + await()), but since await() will release the lock until it wakes up, I think the original name describes better how the locking and blocking is perceived from outside the method. Perhaps waitUntilIdentityIsSetAndLock() is a better name? Of course, better comments in the method would also help. 3. ReplacementPolicy a) Return value from insertEntry() isn't used. The return value isn't needed since the replacement policy will link the CacheEntry and the Callback internally by calling CacheEntry.setCallback(). Will change the method so that it's void. 4. ClockPolicy a) Why not synchronize inside grow()? One of the callers needs the synchronization for more than just the call to grow() and therefore needs the synchronized block anyway, so I felt it was more consistent if all (both) callers used an explicit synchronization block. (It also avoided double synchronization for that caller.) The other methods you mention are supposed to be atomic, and are not intended to be called from within a larger block synchronized on the same object. b) Handling of small cache sizes in rotateClock(). I agree that this is a bit strange. The logic is copied from Clock.rotateClock(), by the way. I believe you are correct that it's not relevant for the (default) cache sizes currently used (none is smaller than 32). Perhaps it would be better to simplify it and just say itemsToCheck = Math.max(20, ...) or something (the number 20 is arbitrary). c) Add more comments about why cleaned entries are not reused. Will update the comments with more details from the JIRA discussion. d) Incomplete @return tag for shrinkMe(). Thanks, fixed in revision 631225. e) Off-by-one error in shrinkMe(). I don't think there is an off-by-one error, since we use the old value of pos, not the one that has been increased by one. I guess it's a bit confusing to use the postfix increment operator and store away the old value on the same line in the code. Perhaps it would be clearer if the code were reordered from index = pos++; h = clock.get(index); to h = clock.get(pos); index = pos; pos++; Or perhaps I misunderstood what you meant? f) Factor out common code for evicting entries in rotateClock() and shrinkMe(). Will do that. g) Comment on the heuristics in trimMe(). As you quite correctly guessed, the heuristics have been taken from the old clock implementation (Clock.trimToSize()), and I don't know why they were chosen. What strikes me with the heuristics, is that they seem to do their best to prevent this method from ever doing anything (which can also be seen in the code coverage reports). Also, I don't see very much value in shrinking a cache that already is smaller than the max size. If someone needs it, the correct way would be to reduce the max size of the cache, I think. Perhaps it would be better just to remove it? Seems like it's much complexity for little added value. h) Naming constants used in heuristic Probably a good idea. Will do that. 5. BackgroundCleaner a) Misleading class javadoc Will fix that comment. b) Comment that serviceImmediately() isn't used by BasicDaemon Could do that. Now, BackgroundCleaner doesn't know that it's being serviced by a BasicDaemon, so I'm not sure whether or not it's BackgroundCleaner's concern that the method is not being used by BasicDaemon. If another implementation of DaemonService is used (that's not controlled by the cache implementation), the returned value may become relevant.
          Hide
          Knut Anders Hatlen added a comment -

          Here's the patch (d2911-enable.diff) that's needed if we want to enable the new buffer manager for Java version 1.5 and higher. It only changes a couple of lines in modules.properties to make sure the correct classes are loaded when the database is booted. It doesn't change anything for 1.4 or Java ME, so they'll still use the old buffer manager.

          Show
          Knut Anders Hatlen added a comment - Here's the patch (d2911-enable.diff) that's needed if we want to enable the new buffer manager for Java version 1.5 and higher. It only changes a couple of lines in modules.properties to make sure the correct classes are loaded when the database is booted. It doesn't change anything for 1.4 or Java ME, so they'll still use the old buffer manager.
          Hide
          Øystein Grøvlen added a comment -

          I agree that we should just go ahead and enable the buffer manager. The issues I raised can be addressed later.

          Show
          Øystein Grøvlen added a comment - I agree that we should just go ahead and enable the buffer manager. The issues I raised can be addressed later.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks Øystein. I checked in d2911-enable.diff with revision 631930. I have tested it with derbyall and suites.All on 1.6, and with suites.All on 1.4.2 and 1.5 (Sun's JVMs on Solaris 10).

          Show
          Knut Anders Hatlen added a comment - Thanks Øystein. I checked in d2911-enable.diff with revision 631930. I have tested it with derbyall and suites.All on 1.6, and with suites.All on 1.4.2 and 1.5 (Sun's JVMs on Solaris 10).
          Hide
          Knut Anders Hatlen added a comment -

          I improved some comments in revision 635556 to address 1a and 1d.

          1b and 1c were addressed in revision 635183 as part of DERBY-3493.

          Show
          Knut Anders Hatlen added a comment - I improved some comments in revision 635556 to address 1a and 1d. 1b and 1c were addressed in revision 635183 as part of DERBY-3493 .
          Hide
          Knut Anders Hatlen added a comment -

          Improved some more comments in revision 635577. This should address 4c, 5a and 5b.

          Show
          Knut Anders Hatlen added a comment - Improved some more comments in revision 635577. This should address 4c, 5a and 5b.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching d2911-12.diff which addresses Øystein's comments 3a, 4a (indirectly, since that code was removed), 4b and 4f. It makes ReplacementPolicy.insertEntry() void since the return value is not used, it simplifies the handling of small caches in ClockPolicy.rotateClock(), and it factors out common code in ClockPolicy.rotateClock() and ClockPolicy.shrinkMe(). This patch is not supposed to change the behaviour in any way.

          suites.All ran cleanly. I have also started derbyall and I will report back if it fails.

          Show
          Knut Anders Hatlen added a comment - Attaching d2911-12.diff which addresses Øystein's comments 3a, 4a (indirectly, since that code was removed), 4b and 4f. It makes ReplacementPolicy.insertEntry() void since the return value is not used, it simplifies the handling of small caches in ClockPolicy.rotateClock(), and it factors out common code in ClockPolicy.rotateClock() and ClockPolicy.shrinkMe(). This patch is not supposed to change the behaviour in any way. suites.All ran cleanly. I have also started derbyall and I will report back if it fails.
          Hide
          Knut Anders Hatlen added a comment -

          All tests passed.

          Show
          Knut Anders Hatlen added a comment - All tests passed.
          Hide
          Knut Anders Hatlen added a comment -

          Committed revision 636247.

          Show
          Knut Anders Hatlen added a comment - Committed revision 636247.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching a patch (d2911-13) which addresses 2a and 4h. I have started the regression tests suite.

          2a: Instead of inventing a new name for lockWhenIdentityIsSet(), I created a new method waitUntilIdentityIsSet() and let the callers first invoke lock() and then waitUntilIdentityIsSet(). I think that makes the code clearer.

          4h: I created named constants for the numeric constants in rotateClock() and shrinkMe(). I didn't make any changes to trimMe(), since I'm wondering if it's best just to remove it.

          After this patch, I think it's only 4g (explain the heuristics in ClockPolicy.trimMe()) that hasn't been addressed. I believe that trimMe() in reality is dead code (supported by the fact that its predecessor Clock.trimToSize() is always a no-op in the regression tests according to the test coverage reports), and a better and more reliable solution for the problem it tries to solve, is to reduce the cache size. Unless someone comes up with a situation where Clock.trimToSize() and ClockPolicy.trimMe() provide valuable functionality, I'm inclined to remove the latter.

          Show
          Knut Anders Hatlen added a comment - Attaching a patch (d2911-13) which addresses 2a and 4h. I have started the regression tests suite. 2a: Instead of inventing a new name for lockWhenIdentityIsSet(), I created a new method waitUntilIdentityIsSet() and let the callers first invoke lock() and then waitUntilIdentityIsSet(). I think that makes the code clearer. 4h: I created named constants for the numeric constants in rotateClock() and shrinkMe(). I didn't make any changes to trimMe(), since I'm wondering if it's best just to remove it. After this patch, I think it's only 4g (explain the heuristics in ClockPolicy.trimMe()) that hasn't been addressed. I believe that trimMe() in reality is dead code (supported by the fact that its predecessor Clock.trimToSize() is always a no-op in the regression tests according to the test coverage reports), and a better and more reliable solution for the problem it tries to solve, is to reduce the cache size. Unless someone comes up with a situation where Clock.trimToSize() and ClockPolicy.trimMe() provide valuable functionality, I'm inclined to remove the latter.
          Hide
          Knut Anders Hatlen added a comment -

          Committed revision 636670.

          Show
          Knut Anders Hatlen added a comment - Committed revision 636670.
          Hide
          Knut Anders Hatlen added a comment -

          Attaching two new patches for review:

          d2911-14.diff - rewrites the part of ClockPolicy.shrinkMe() that Øystein mentioned in his comment 4e. It is not supposed to change the behaviour, only make the relationship between the variables pos and index clearer.

          d2911-15.diff - removes the method ClockPolicy.trimeMe() since it's only adding complexity and doesn't have any real value (I think).

          Derbyall and suites.All ran cleanly (except ManagementMBeanTest which has failed for some time in the nightly regression tests too).

          Show
          Knut Anders Hatlen added a comment - Attaching two new patches for review: d2911-14.diff - rewrites the part of ClockPolicy.shrinkMe() that Øystein mentioned in his comment 4e. It is not supposed to change the behaviour, only make the relationship between the variables pos and index clearer. d2911-15.diff - removes the method ClockPolicy.trimeMe() since it's only adding complexity and doesn't have any real value (I think). Derbyall and suites.All ran cleanly (except ManagementMBeanTest which has failed for some time in the nightly regression tests too).
          Hide
          Knut Anders Hatlen added a comment -

          Committed d2911-14.diff with revision 642752.
          Committed d2911-15.diff with revision 642755.

          I think all comments have been addressed now, and I'm not planning any more patches on this issue. Some of the patches have only been committed to trunk (revisions 635577, 636247, 636670, 642752 and 642755). None of them needs to go to the 10.4 branch, but unless there are objections, I'll merge them to minimize the difference between trunk and the branch (which makes it easier to map stack traces from bug reports against 10.4 to the code in trunk).

          Show
          Knut Anders Hatlen added a comment - Committed d2911-14.diff with revision 642752. Committed d2911-15.diff with revision 642755. I think all comments have been addressed now, and I'm not planning any more patches on this issue. Some of the patches have only been committed to trunk (revisions 635577, 636247, 636670, 642752 and 642755). None of them needs to go to the 10.4 branch, but unless there are objections, I'll merge them to minimize the difference between trunk and the branch (which makes it easier to map stack traces from bug reports against 10.4 to the code in trunk).
          Hide
          Kristian Waagan added a comment -

          I think it is a good idea to merge the changes to the 10.4 branch before the release candidate is produced.
          +1 from me.

          Show
          Kristian Waagan added a comment - I think it is a good idea to merge the changes to the 10.4 branch before the release candidate is produced. +1 from me.
          Hide
          Knut Anders Hatlen added a comment -

          Merged the latest changes (rev. 635556, 635577, 636247, 636670, 642752, 642755) to 10.4 and committed revision 643327.

          Marking the issue as resolved (with fix version 10.4.0.0 since almost all the code went in before the 10.4 branch was cut).

          Show
          Knut Anders Hatlen added a comment - Merged the latest changes (rev. 635556, 635577, 636247, 636670, 642752, 642755) to 10.4 and committed revision 643327. Marking the issue as resolved (with fix version 10.4.0.0 since almost all the code went in before the 10.4 branch was cut).

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              Knut Anders Hatlen
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development