Derby
  1. Derby
  2. DERBY-3024

Validation of shared plans hurts scalability

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 10.4.1.3
    • Fix Version/s: 10.6.1.0
    • Component/s: SQL
    • Environment:
      Sun Java SE 6, Solaris 10, Sun Fire V880 (8 CPUs)
    • Urgency:
      Normal
    • Bug behavior facts:
      Performance

      Description

      To investigate whether there was anything in the SQL execution layer that prevented scaling on a multi-CPU machine, I wrote a multi-threaded test which continuously executed "VALUES 1" using a PreparedStatement. I ran the test on a machine with 8 CPUs and expected the throughput to be proportional to the number of concurrent clients up to 8 clients (the same as the number of CPUs). However, the throughput only had a small increase from 1 to 2 clients, and adding more clients did not increase the throughput. Looking at the test in a profiler, it seems like the threads are spending a lot of time waiting to enter synchronization blocks in GenericPreparedStatement.upToDate() and BaseActivation.checkStatementValidity() (both of which are synchronized on the a GenericPreparedStatement object).

      I then changed the test slightly, appending a comment with a unique thread id to the "VALUES 1" statement. That means the threads still did the same work, but each thread got its own plan (GenericPreparedStatement object) since the statement cache didn't regard the SQL text strings as identical. When I made that change, the test scaled more or less perfectly up to 8 concurrent threads.

      We should try to find a way to make the scalability the same regardless of whether or not the threads share the same plan.

      1. values1.png
        5 kB
        Knut Anders Hatlen
      2. Values.java
        2 kB
        Knut Anders Hatlen
      3. patch-2a.png
        5 kB
        Knut Anders Hatlen
      4. patch-2a.diff
        1.0 kB
        Knut Anders Hatlen
      5. patch-1a.png
        8 kB
        Knut Anders Hatlen
      6. patch-1a.diff
        3 kB
        Knut Anders Hatlen

        Activity

        Hide
        Knut Anders Hatlen added a comment -

        Attaching the test and a graph showing the difference in performance between shared plans and separate plans on a machine with 8 CPUs.

        Show
        Knut Anders Hatlen added a comment - Attaching the test and a graph showing the difference in performance between shared plans and separate plans on a machine with 8 CPUs.
        Hide
        Manish Khettry added a comment -

        That is very interesting. A couple of thoughts on this.

        First, the point of sharing plans is to avoid doing potentially expensive compilation. By choosing a really simple query which is cheap both to compile and execute you are effectively measuring only the cost of sharing plans. If you had even a slightly more expensive query, I doubt you would see such a huge disparity between the two cases.

        That having been said, the lack of any speedup is troubling. I ran the same query to see how many times the routines you mentioned (GenericPreparedStatement#upToDate and BaseActivation#checkStatementValdity) are executed. The first one is called five times per query and the second one once. I haven't looked at the code too closely but it does seem excessive and could be a starting point to investigate contention.

        Also, there are two other routines GPS#finish and GPS#getActivation which synchronize on the GPS and are called once per statement so these routines add to the contention as well.

        Show
        Manish Khettry added a comment - That is very interesting. A couple of thoughts on this. First, the point of sharing plans is to avoid doing potentially expensive compilation. By choosing a really simple query which is cheap both to compile and execute you are effectively measuring only the cost of sharing plans. If you had even a slightly more expensive query, I doubt you would see such a huge disparity between the two cases. That having been said, the lack of any speedup is troubling. I ran the same query to see how many times the routines you mentioned (GenericPreparedStatement#upToDate and BaseActivation#checkStatementValdity) are executed. The first one is called five times per query and the second one once . I haven't looked at the code too closely but it does seem excessive and could be a starting point to investigate contention. Also, there are two other routines GPS#finish and GPS#getActivation which synchronize on the GPS and are called once per statement so these routines add to the contention as well.
        Hide
        Knut Anders Hatlen added a comment -

        Thanks for investigating this, Manish!

        I agree that the test is not representative of a real-world application, but that wasn't my aim when I wrote it. I just wanted to see if there were any basic part of the SQL execution layer that would be a bottleneck on a multi-CPU machine. VALUES 1 seemed to be a good choice since it avoids accesses to the buffer manager, which is a known multi-CPU bottleneck. I think of it more like looking at a small part of Derby through a magnifying glass or a microscope.

        When I run the test, I only see three calls to GPS.upToDate(), one call to BA.checkStatementValidity(), and none to GPS.finish() and GPS.getActivation(). You didn't by any chance use a Statement instead of a PreparedStatement?

        I'm not sure I quite understand how the interaction with upToDate() works. If upToDate() returns true, we know (because of the synchronization) that at some point after we called upToDate() and before it returned, the compiled plan was up to date. However, the synchronization doesn't guarantee that the plan is up to date the moment after the method has returned, does it? How do we know the plan is still valid then? Is it because of the uncertainty we keep calling upToDate() multiple times during execution?

        Show
        Knut Anders Hatlen added a comment - Thanks for investigating this, Manish! I agree that the test is not representative of a real-world application, but that wasn't my aim when I wrote it. I just wanted to see if there were any basic part of the SQL execution layer that would be a bottleneck on a multi-CPU machine. VALUES 1 seemed to be a good choice since it avoids accesses to the buffer manager, which is a known multi-CPU bottleneck. I think of it more like looking at a small part of Derby through a magnifying glass or a microscope. When I run the test, I only see three calls to GPS.upToDate(), one call to BA.checkStatementValidity(), and none to GPS.finish() and GPS.getActivation(). You didn't by any chance use a Statement instead of a PreparedStatement? I'm not sure I quite understand how the interaction with upToDate() works. If upToDate() returns true, we know (because of the synchronization) that at some point after we called upToDate() and before it returned, the compiled plan was up to date. However, the synchronization doesn't guarantee that the plan is up to date the moment after the method has returned, does it? How do we know the plan is still valid then? Is it because of the uncertainty we keep calling upToDate() multiple times during execution?
        Hide
        Daniel John Debrunner added a comment -

        GPS#getActivation & GPS#finish will not be called per execution (except when using a Statement).

        The upToDate() check interacts with the table locking of any DDL that lead to the invalidation.

        When a table T is modified via DDL there is an exclusive lock held on T.
        This lock is obtained and then plans dependent on that table are modified.

        Thus if a statement has obtained an intent lock on T and it is valid (upToDate()) then it can complete its execution knowing that no DDL can proceed and invalidate it since it holds an intent table lock that will block any DDL's exclusive lock.

        So ideally a plan will check that it's up to date once all of its table locks are obtained, in Derby this is not centralized. Some DBMS's as part of their compilation setup a list of table intent locks and obtain them at the start of execution. In Derby this is handled by calling checkStatementValdity() in each open of a ResultSet (possibly regardless of it it obtains a table lock or not).

        Ideally this would be in one place, maybe after the open of the top level (language) ResultSet and thus executed once per-plan. I'm not sure though if the top-level open is guaranteed to open all the tables that the plan requires.

        There's room for improvement here, not least by writing up & understanding all the interactions.

        Show
        Daniel John Debrunner added a comment - GPS#getActivation & GPS#finish will not be called per execution (except when using a Statement). The upToDate() check interacts with the table locking of any DDL that lead to the invalidation. When a table T is modified via DDL there is an exclusive lock held on T. This lock is obtained and then plans dependent on that table are modified. Thus if a statement has obtained an intent lock on T and it is valid (upToDate()) then it can complete its execution knowing that no DDL can proceed and invalidate it since it holds an intent table lock that will block any DDL's exclusive lock. So ideally a plan will check that it's up to date once all of its table locks are obtained, in Derby this is not centralized. Some DBMS's as part of their compilation setup a list of table intent locks and obtain them at the start of execution. In Derby this is handled by calling checkStatementValdity() in each open of a ResultSet (possibly regardless of it it obtains a table lock or not). Ideally this would be in one place, maybe after the open of the top level (language) ResultSet and thus executed once per-plan. I'm not sure though if the top-level open is guaranteed to open all the tables that the plan requires. There's room for improvement here, not least by writing up & understanding all the interactions.
        Hide
        Knut Anders Hatlen added a comment -

        I ran the Values1 test on a Sun Fire T2000 with 32 virtual processors (running
        Solaris 10 and Java version 1.6.0_15) and noticed that there was a simple
        change in BaseActivation.checkStatementValidity() that improved the situation
        somewhat. As mentioned in the previous comments, there's a synchronized block
        in checkStatementValidity() where a lot of time is spent waiting:

        synchronized (preStmt)

        { if ((gc == preStmt.getActivationClass()) && preStmt.upToDate()) return; }

        If the (gc == preStmt.getActivationClass()) check is moved inside
        preStmt.upToDate(), which is also synchronized on preStmt, we avoid a double
        synchronization. This appears to take some of the pressure off the monitor and
        allows the Values1 test to scale better. The preStmt monitor is still very hot,
        though, so the performance still breaks down when too many threads are added,
        but it is able to handle more threads than before before it breaks down.

        The attached patch and graph (patch-1a.diff and patch-1a.png) show the change
        and its effect on the scalability. Whereas trunk maxes out on 5 threads and
        305K tx/s, the patched version maxes out on 7 threads and 520K tx/s. After both
        trunk and the patched version have collapsed because of too many threads, the
        patched version seems to stabilize on a level 30% higher than trunk.

        For comparison, the graph also shows the results for trunk with separate plans
        for each thread. Its throughput grows steadily for each thread added until the
        number of threads reaches the number of virtual processors (32), which is still
        far better than with shared plans, so it's clear that the patch is not a full
        solution to this issue. It doesn't do anything with the underlying problem,
        which is that upToDate() is called way too frequently during execution, but it
        may be a good first step to remove the overhead of shared plans.

        One may perhaps expect the JVM to be able to eliminate double synchronization,
        so that such a change should not be necessary. Anyhow, I think the change would
        make sense even without any performance benefit, as it hides some of
        GenericPreparedStatement's internal synchronization details from users of the
        PreparedStatement interface.

        Show
        Knut Anders Hatlen added a comment - I ran the Values1 test on a Sun Fire T2000 with 32 virtual processors (running Solaris 10 and Java version 1.6.0_15) and noticed that there was a simple change in BaseActivation.checkStatementValidity() that improved the situation somewhat. As mentioned in the previous comments, there's a synchronized block in checkStatementValidity() where a lot of time is spent waiting: synchronized (preStmt) { if ((gc == preStmt.getActivationClass()) && preStmt.upToDate()) return; } If the (gc == preStmt.getActivationClass()) check is moved inside preStmt.upToDate(), which is also synchronized on preStmt, we avoid a double synchronization. This appears to take some of the pressure off the monitor and allows the Values1 test to scale better. The preStmt monitor is still very hot, though, so the performance still breaks down when too many threads are added, but it is able to handle more threads than before before it breaks down. The attached patch and graph (patch-1a.diff and patch-1a.png) show the change and its effect on the scalability. Whereas trunk maxes out on 5 threads and 305K tx/s, the patched version maxes out on 7 threads and 520K tx/s. After both trunk and the patched version have collapsed because of too many threads, the patched version seems to stabilize on a level 30% higher than trunk. For comparison, the graph also shows the results for trunk with separate plans for each thread. Its throughput grows steadily for each thread added until the number of threads reaches the number of virtual processors (32), which is still far better than with shared plans, so it's clear that the patch is not a full solution to this issue. It doesn't do anything with the underlying problem, which is that upToDate() is called way too frequently during execution, but it may be a good first step to remove the overhead of shared plans. One may perhaps expect the JVM to be able to eliminate double synchronization, so that such a change should not be necessary. Anyhow, I think the change would make sense even without any performance benefit, as it hides some of GenericPreparedStatement's internal synchronization details from users of the PreparedStatement interface.
        Hide
        Knut Anders Hatlen added a comment -

        Committed patch-1a.diff to trunk with revision 824657.

        Show
        Knut Anders Hatlen added a comment - Committed patch-1a.diff to trunk with revision 824657.
        Hide
        Knut Anders Hatlen added a comment -

        EmbedStatement.executeStatement() calls rePrepare() on the execution plan before trying to execute it. Later, in the same method, the plan is executed by calling GenericPreparedStatement.execute(). One of the first things GPS.execute() does, is to call rePrepare().

        Although calling rePrepare() on an already prepared and valid plan is basically a no-op, it does involve a check of statement validity and therefore contributes to the problem reported in this issue. It seems to me that the first call to rePrepare() is unnecessary, since the statement will be re-prepared if needed right before execution anyways.

        The attached patch (2a) removes the call to rePrepare() in EmbedStatement.executeStatement(). It also moves the retrieval of compile-time warnings until execute() has been called, since re-preparation will now happen in execute().

        All the regression tests ran cleanly with the patch. I also reran the Values test on a machine similar to the one used for testing patch 1a. The attached graph patch-2a.png shows the performance compared with trunk (which includes the 1a patch) and 10.5.3.0 (which contains none of the patches). There's still a long way to go before the scalability is similar to the one we get when the threads don't share the same execution plan, but the patch does seem to improve the situation somewhat.

        Show
        Knut Anders Hatlen added a comment - EmbedStatement.executeStatement() calls rePrepare() on the execution plan before trying to execute it. Later, in the same method, the plan is executed by calling GenericPreparedStatement.execute(). One of the first things GPS.execute() does, is to call rePrepare(). Although calling rePrepare() on an already prepared and valid plan is basically a no-op, it does involve a check of statement validity and therefore contributes to the problem reported in this issue. It seems to me that the first call to rePrepare() is unnecessary, since the statement will be re-prepared if needed right before execution anyways. The attached patch (2a) removes the call to rePrepare() in EmbedStatement.executeStatement(). It also moves the retrieval of compile-time warnings until execute() has been called, since re-preparation will now happen in execute(). All the regression tests ran cleanly with the patch. I also reran the Values test on a machine similar to the one used for testing patch 1a. The attached graph patch-2a.png shows the performance compared with trunk (which includes the 1a patch) and 10.5.3.0 (which contains none of the patches). There's still a long way to go before the scalability is similar to the one we get when the threads don't share the same execution plan, but the patch does seem to improve the situation somewhat.
        Hide
        Dag H. Wanvik added a comment -

        The patch 2a looks safe to me, +1. Good catch

        Show
        Dag H. Wanvik added a comment - The patch 2a looks safe to me, +1. Good catch
        Hide
        Knut Anders Hatlen added a comment -

        Thanks, Dag! Committed revision 902147.

        Show
        Knut Anders Hatlen added a comment - Thanks, Dag! Committed revision 902147.
        Hide
        Mamta A. Satoor added a comment -

        Hi Knut, can this jira be closed. It looks like there were 2 commits made for it,

        Show
        Mamta A. Satoor added a comment - Hi Knut, can this jira be closed. It looks like there were 2 commits made for it,
        Hide
        Knut Anders Hatlen added a comment -

        I think the problem described in this issue is not completely fixed yet. However, since some fixes have been committed, and it's not clear what more needs to be done, I'm closing the issue. If someone later comes up with an idea on how to improve this code, that work could be tracked in a separate issue.

        Show
        Knut Anders Hatlen added a comment - I think the problem described in this issue is not completely fixed yet. However, since some fixes have been committed, and it's not clear what more needs to be done, I'm closing the issue. If someone later comes up with an idea on how to improve this code, that work could be tracked in a separate issue.

          People

          • Assignee:
            Knut Anders Hatlen
            Reporter:
            Knut Anders Hatlen
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development