During profiling I found that there is a lot of work with HashMap inside tapestry framework.
With following patch time per request decreased on 2.2ms (4.6% of overall time). Measurements were done with apache benchmark on a real application after warm up phase.
The idea behind the patch is to get rid of double lookup inside HashMap (or event triple lookup) whenever only one lookup is enough.
Also this patch decrease number of ThreadLocal.get calls in two times by moving PerThreadServiceCreator functionality into PerthreadManager.