Issue Details (XML | Word | Printable)

Key: DERBY-3024
Type: Improvement Improvement
Status: Open Open
Priority: Minor Minor
Assignee: Unassigned
Reporter: Knut Anders Hatlen
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Derby

Validation of shared plans hurts scalability

Created: 23/Aug/07 10:24 AM   Updated: 13/Oct/09 10:01 AM
Return to search
Component/s: SQL
Affects Version/s: 10.4.1.3
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
File Licensed for inclusion in ASF works patch-1a.diff 2009-09-29 02:22 PM Knut Anders Hatlen 3 kB
Java Source File Licensed for inclusion in ASF works Values.java 2007-08-23 10:26 AM Knut Anders Hatlen 2 kB
Image Attachments:

1. patch-1a.png
(8 kB)

2. values1.png
(5 kB)
Environment: Sun Java SE 6, Solaris 10, Sun Fire V880 (8 CPUs)

Bug behavior facts: Performance


 Description  « Hide
To investigate whether there was anything in the SQL execution layer that prevented scaling on a multi-CPU machine, I wrote a multi-threaded test which continuously executed "VALUES 1" using a PreparedStatement. I ran the test on a machine with 8 CPUs and expected the throughput to be proportional to the number of concurrent clients up to 8 clients (the same as the number of CPUs). However, the throughput only had a small increase from 1 to 2 clients, and adding more clients did not increase the throughput. Looking at the test in a profiler, it seems like the threads are spending a lot of time waiting to enter synchronization blocks in GenericPreparedStatement.upToDate() and BaseActivation.checkStatementValidity() (both of which are synchronized on the a GenericPreparedStatement object).

I then changed the test slightly, appending a comment with a unique thread id to the "VALUES 1" statement. That means the threads still did the same work, but each thread got its own plan (GenericPreparedStatement object) since the statement cache didn't regard the SQL text strings as identical. When I made that change, the test scaled more or less perfectly up to 8 concurrent threads.

We should try to find a way to make the scalability the same regardless of whether or not the threads share the same plan.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Knut Anders Hatlen added a comment - 23/Aug/07 10:26 AM
Attaching the test and a graph showing the difference in performance between shared plans and separate plans on a machine with 8 CPUs.

Manish Khettry added a comment - 23/Aug/07 05:49 PM
That is very interesting. A couple of thoughts on this.

First, the point of sharing plans is to avoid doing potentially expensive compilation. By choosing a really simple query which is cheap both to compile and execute you are effectively measuring only the cost of sharing plans. If you had even a slightly more expensive query, I doubt you would see such a huge disparity between the two cases.

That having been said, the lack of any speedup is troubling. I ran the same query to see how many times the routines you mentioned (GenericPreparedStatement#upToDate and BaseActivation#checkStatementValdity) are executed. The first one is called *five* times per query and the second one *once*. I haven't looked at the code too closely but it does seem excessive and could be a starting point to investigate contention.
 
Also, there are two other routines GPS#finish and GPS#getActivation which synchronize on the GPS and are called once per statement so these routines add to the contention as well.


Knut Anders Hatlen added a comment - 24/Aug/07 12:50 PM
Thanks for investigating this, Manish!

I agree that the test is not representative of a real-world application, but that wasn't my aim when I wrote it. I just wanted to see if there were any basic part of the SQL execution layer that would be a bottleneck on a multi-CPU machine. VALUES 1 seemed to be a good choice since it avoids accesses to the buffer manager, which is a known multi-CPU bottleneck. I think of it more like looking at a small part of Derby through a magnifying glass or a microscope. :)

When I run the test, I only see three calls to GPS.upToDate(), one call to BA.checkStatementValidity(), and none to GPS.finish() and GPS.getActivation(). You didn't by any chance use a Statement instead of a PreparedStatement?

I'm not sure I quite understand how the interaction with upToDate() works. If upToDate() returns true, we know (because of the synchronization) that at some point after we called upToDate() and before it returned, the compiled plan was up to date. However, the synchronization doesn't guarantee that the plan is up to date the moment after the method has returned, does it? How do we know the plan is still valid then? Is it because of the uncertainty we keep calling upToDate() multiple times during execution?

Daniel John Debrunner added a comment - 27/Oct/07 12:06 AM
GPS#getActivation & GPS#finish will not be called per execution (except when using a Statement).

The upToDate() check interacts with the table locking of any DDL that lead to the invalidation.

When a table T is modified via DDL there is an exclusive lock held on T.
This lock is obtained and then plans dependent on that table are modified.

Thus if a statement has obtained an intent lock on T and it is valid (upToDate()) then it can complete its execution knowing that no DDL can proceed and invalidate it since it holds an intent table lock that will block any DDL's exclusive lock.

So ideally a plan will check that it's up to date once all of its table locks are obtained, in Derby this is not centralized. Some DBMS's as part of their compilation setup a list of table intent locks and obtain them at the start of execution. In Derby this is handled by calling checkStatementValdity() in *each* open of a ResultSet (possibly regardless of it it obtains a table lock or not).

Ideally this would be in one place, maybe after the open of the top level (language) ResultSet and thus executed once per-plan. I'm not sure though if the top-level open is guaranteed to open all the tables that the plan requires.

There's room for improvement here, not least by writing up & understanding all the interactions.


 

Knut Anders Hatlen added a comment - 29/Sep/09 02:22 PM
I ran the Values1 test on a Sun Fire T2000 with 32 virtual processors (running
Solaris 10 and Java version 1.6.0_15) and noticed that there was a simple
change in BaseActivation.checkStatementValidity() that improved the situation
somewhat. As mentioned in the previous comments, there's a synchronized block
in checkStatementValidity() where a lot of time is spent waiting:

    synchronized (preStmt) {
        if ((gc == preStmt.getActivationClass()) && preStmt.upToDate())
            return;
    }

If the (gc == preStmt.getActivationClass()) check is moved inside
preStmt.upToDate(), which is also synchronized on preStmt, we avoid a double
synchronization. This appears to take some of the pressure off the monitor and
allows the Values1 test to scale better. The preStmt monitor is still very hot,
though, so the performance still breaks down when too many threads are added,
but it is able to handle more threads than before before it breaks down.

The attached patch and graph (patch-1a.diff and patch-1a.png) show the change
and its effect on the scalability. Whereas trunk maxes out on 5 threads and
305K tx/s, the patched version maxes out on 7 threads and 520K tx/s. After both
trunk and the patched version have collapsed because of too many threads, the
patched version seems to stabilize on a level 30% higher than trunk.

For comparison, the graph also shows the results for trunk with separate plans
for each thread. Its throughput grows steadily for each thread added until the
number of threads reaches the number of virtual processors (32), which is still
far better than with shared plans, so it's clear that the patch is not a full
solution to this issue. It doesn't do anything with the underlying problem,
which is that upToDate() is called way too frequently during execution, but it
may be a good first step to remove the overhead of shared plans.

One may perhaps expect the JVM to be able to eliminate double synchronization,
so that such a change should not be necessary. Anyhow, I think the change would
make sense even without any performance benefit, as it hides some of
GenericPreparedStatement's internal synchronization details from users of the
PreparedStatement interface.

Knut Anders Hatlen added a comment - 13/Oct/09 10:01 AM
Committed patch-1a.diff to trunk with revision 824657.