Some clients have recently reported apparent hangs with their applications. In all cases the symptoms were the same:
- All sessions appear to be hung in LWLockAcquire or Release, specifically s_lock
- there is a high number of concurrent sessions (close to 100)
- System is not actually hung, normally processing resumes after some period of time when all sessions have completed their locking work
The postgresql developer community has found several issues with performance under high concurrency (> 32 sessions) in the spin-lock mechanism we've inherited in HAWQ. This ultimately has been corrected in 9.5 with a replacement to the spin-lock mechanism and appears to provide a significant boost to query performance.
The actual fix is in commit: ab5194e6f617a9a9e7aadb3dd1cee948a42d0755
Only 1 line commit to s_lock.c could help address this and would be easy enough to cherry-pick: b03d196be055450c7260749f17347c2d066b4254