To investigate whether there was anything in the SQL execution layer that prevented scaling on a multi-CPU machine, I wrote a multi-threaded test which continuously executed "VALUES 1" using a PreparedStatement. I ran the test on a machine with 8 CPUs and expected the throughput to be proportional to the number of concurrent clients up to 8 clients (the same as the number of CPUs). However, the throughput only had a small increase from 1 to 2 clients, and adding more clients did not increase the throughput. Looking at the test in a profiler, it seems like the threads are spending a lot of time waiting to enter synchronization blocks in GenericPreparedStatement.upToDate() and BaseActivation.checkStatementValidity() (both of which are synchronized on the a GenericPreparedStatement object).
I then changed the test slightly, appending a comment with a unique thread id to the "VALUES 1" statement. That means the threads still did the same work, but each thread got its own plan (GenericPreparedStatement object) since the statement cache didn't regard the SQL text strings as identical. When I made that change, the test scaled more or less perfectly up to 8 concurrent threads.
We should try to find a way to make the scalability the same regardless of whether or not the threads share the same plan.