Bug 6444 - tok_touch_all update forces full table scan, kills performance.
Summary: tok_touch_all update forces full table scan, kills performance.
Status: RESOLVED WONTFIX
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: 3.3.1
Hardware: PC Linux
: P2 major
Target Milestone: 3.4.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-03 14:29 UTC by Bradley Kieser
Modified: 2014-04-09 18:23 UTC (History)
5 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Amended PgSQL.pm file with the optimised tok_touch_all text/plain None Bradley Kieser [NoCLA]
Here is just the diff from the current version, for ease of review. patch None Mark Martinec [HasCLA]
optimized patch patch None Mark Martinec [HasCLA]
take 2 (wrong sub name) patch None Mark Martinec [HasCLA]
diagram showing elapsed time spent in bayes, vs. time of the day image/png None Mark Martinec [HasCLA]
iagram showing elapsed time spent in tok_touch_all, vs. time of the day image/png None Mark Martinec [HasCLA]
diagram: Time spent in Mail::SpamAssassin::BayesStore::PgSQL::tok_touch_all image/png None Mark Martinec [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Bradley Kieser 2010-06-03 14:29:04 UTC
The current implementation of the atime column update in tok_touch_all uses an "IN" statement for the token field. Given that the bayes_token table has id,token as the primary key, using the IN statement in this way kills performance in an exponential manner.

It also is the reason why so many people have been experiencing the serious deadlocking issues reported across so many web sites and chat forums.

This can be easily re-worked to use the very strong id,token primary key by breaking out the loop to a higher level and converting the where into id = ? and token = ?


I have done this and the patch will be uploaded the amended file as an attachment.

The performance boosts is huge and gets rid of the deadlocking problem that is plaguing so many people.
Comment 1 Bradley Kieser 2010-06-03 14:43:44 UTC
Created attachment 4768 [details]
Amended PgSQL.pm file with the optimised tok_touch_all

This file has the amended tok_touch_all function with the optimised update of column atime (in a loop rather than using the IN operator).
Comment 2 Mark Martinec 2010-06-03 14:51:17 UTC
Created attachment 4769 [details]
Here is just the diff from the current version, for ease of review.
Comment 3 Kevin A. McGrail 2010-06-03 17:15:36 UTC
+1 on logic and patch though I don't use postgres.  Bradley, can you confirm the new patch diffed from the current version solves the issue and works for you, please?
Comment 4 Mark Martinec 2010-06-03 18:35:12 UTC
> Bradley, can you confirm the new patch diffed from the
> current version solves the issue and works for you, please?

No need to, it _is_ the same thing, the diff is just another
representation of the proposed replacement module.

I haven't tried it yet, but I have great expectations: I tried
twice in the past to switch bayes from MySQL to PostgreSQL
(different version), and every time it turned out that the
'IN' operator yields very poor performance the way it is
used by SA. I hope to be able to try this new approach soon.
Other experience and review is most welcome.
Comment 5 Bradley Kieser 2010-06-03 20:19:07 UTC
(In reply to comment #3)
> +1 on logic and patch though I don't use postgres.  Bradley, can you confirm
> the new patch diffed from the current version solves the issue and works for
> you, please?

I can certainly confirm that these changes work! We tested it and then promoted into production. I would hope that others will test it too, but I looked in the Postgres activity monitor and saw the correct SQLs in there and the difference to the load was absolutely enormous!

The problem with the "IN" operator was that with a larger database and a high volume of emails, performance degradation was exponential. Now we are unable to measure it because the load has been reduced so much we don't see any degradation at all.
Comment 6 Bradley Kieser 2010-06-03 20:23:12 UTC
(In reply to comment #4)
> > Bradley, can you confirm the new patch diffed from the
> > current version solves the issue and works for you, please?
> 
> No need to, it _is_ the same thing, the diff is just another
> representation of the proposed replacement module.
> 
> I haven't tried it yet, but I have great expectations: I tried
> twice in the past to switch bayes from MySQL to PostgreSQL
> (different version), and every time it turned out that the
> 'IN' operator yields very poor performance the way it is
> used by SA. I hope to be able to try this new approach soon.
> Other experience and review is most welcome.

Hi Mark,

This patch will certainly address that issue for you. There are a few more areas that I will be looking at next, the awl table inserts frequently kick out a duplicate key error (minor) and running sa-learn can kill the system too... again it looks like there is a nasty "IN" clause being used there too. For us, if we run sa-learn at the moment the exim processes quickly stack up (we use exim) in what looks like lock waits.

Not fully investigated that yet, but thought that it was best to mention that here because I would hate for you to put the patch in only to be killed with sa-learn!
Comment 7 Mark Martinec 2010-06-10 12:58:24 UTC
Created attachment 4773 [details]
optimized patch

I did some benchmarking on a smallish database (started it few days ago,
thanks to enhancement in Bug 6447 it only accumulated 5000 tokens so far),
and the Bradley's patch doesn't fare too well. Turns out it assembles the
SQL clause for every token, and it unnecessarily updates newest_token_age
once for each token. Also the sort is probably redundant.

I factored out the invariant operations from Bradley's proposal, which
resulted in the attached patch - and it became about 10 times faster
for a message that needed to update 150 tokens. The patch also adds
tok_get_all and tok_touch_all timing measurements to the timing report.

Probably because of the small set of tokens in my database, the original
code did even a little bit better than my patched code, although I believe
that the difference can turn the other way around as reported for a large
database.

Here are times in milliseconds for a tok_touch_all() which needed
to update 150 tokens each time (several runs):

original tok_touch_all:
  tok_touch_all: 33
  tok_touch_all: 16
  tok_touch_all:  7
  tok_touch_all:  7
  tok_touch_all: 21
  tok_touch_all: 19
  tok_touch_all: 12
  tok_touch_all:  6
  tok_touch_all: 12
  tok_touch_all:  6
  tok_touch_all: 29

new(Mark) tok_touch_all:
  tok_touch_all: 35
  tok_touch_all: 40
  tok_touch_all: 68
  tok_touch_all: 42
  tok_touch_all: 33
  tok_touch_all: 39
  tok_touch_all: 48
  tok_touch_all: 33

new(Bradley) tok_touch_all:
  tok_touch_all: 413
  tok_touch_all: 330
  tok_touch_all: 253
  tok_touch_all: 525
  tok_touch_all: 579
  tok_touch_all: 248
  tok_touch_all: 329
  tok_touch_all: 753
Comment 8 Mark Martinec 2010-06-10 13:07:30 UTC
Created attachment 4774 [details]
take 2 (wrong sub name)
Comment 9 Mark Martinec 2010-06-10 13:09:31 UTC
trunk:
  Bug 6444: tok_touch_all update forces full table scan, kills performance
Sending lib/Mail/SpamAssassin/BayesStore/PgSQL.pm
Sending lib/Mail/SpamAssassin/Plugin/Bayes.pm
Committed revision 953397.
Comment 10 Mark Martinec 2010-06-10 13:24:12 UTC
Not to forget, the proposal in Bug 4400 (PostgreSQL schema maintains unnecessary
additional index) is also very much worth it!  To summarize it:

  -- Drop the secondary index
  drop index bayes_token_idx1;
  -- Drop the old primary key
  alter table bayes_token drop constraint bayes_token_pkey;
  -- Add the goodnew primary key
  alter table bayes_token add constraint bayes_token_pkey primary key (token, id);

And the Bug 6447 (new feature to bayes autolearning: learn-on-error)
helps to curtail the growth of a table bayes_tokens.
Comment 11 Bradley Kieser 2010-06-10 17:01:18 UTC
(In reply to comment #10)
> Not to forget, the proposal in Bug 4400 (PostgreSQL schema maintains
> unnecessary
> additional index) is also very much worth it!  To summarize it:
> 
>   -- Drop the secondary index
>   drop index bayes_token_idx1;
>   -- Drop the old primary key
>   alter table bayes_token drop constraint bayes_token_pkey;
>   -- Add the goodnew primary key
>   alter table bayes_token add constraint bayes_token_pkey primary key (token,
> id);
> 
> And the Bug 6447 (new feature to bayes autolearning: learn-on-error)
> helps to curtail the growth of a table bayes_tokens.


Mark,

I see from your comments that you are getting bad results from my patch, but that is not surprising.

Because I forgot to add that you need to create an index as well.

create index bayes_token_idx1 on bayes_token(token);

That is the key to the speed with my patch.

Yes, it does create the string on each for the update, but that is exactly what it is meant to do. This is to avoid the deadlocks. So it does do the update one by one.

Can you please re-run the tests with the missing index added? This is what I have in my database:

\d bayes_token
            Table "public.bayes_token"
   Column   |  Type   |         Modifiers          
------------+---------+----------------------------
 id         | integer | not null default 0
 token      | bytea   | not null default ''::bytea
 spam_count | integer | not null default 0
 ham_count  | integer | not null default 0
 atime      | integer | not null default 0
Indexes:
    "bayes_token_pkey" PRIMARY KEY, btree (id, token)
    "bayes_token_idx1" btree (token)
Comment 12 Bradley Kieser 2010-06-10 17:03:51 UTC
(In reply to comment #7)
> Created an attachment (id=4773) [details]
> optimized patch
> 
> I did some benchmarking on a smallish database (started it few days ago,
> thanks to enhancement in Bug 6447 it only accumulated 5000 tokens so far),
> and the Bradley's patch doesn't fare too well. Turns out it assembles the
> SQL clause for every token, and it unnecessarily updates newest_token_age
> once for each token. Also the sort is probably redundant.
> 
> I factored out the invariant operations from Bradley's proposal, which
> resulted in the attached patch - and it became about 10 times faster
> for a message that needed to update 150 tokens. The patch also adds
> tok_get_all and tok_touch_all timing measurements to the timing report.
> 
> Probably because of the small set of tokens in my database, the original
> code did even a little bit better than my patched code, although I believe
> that the difference can turn the other way around as reported for a large
> database.
> 
> Here are times in milliseconds for a tok_touch_all() which needed
> to update 150 tokens each time (several runs):
> 
> original tok_touch_all:
>   tok_touch_all: 33
>   tok_touch_all: 16
>   tok_touch_all:  7
>   tok_touch_all:  7
>   tok_touch_all: 21
>   tok_touch_all: 19
>   tok_touch_all: 12
>   tok_touch_all:  6
>   tok_touch_all: 12
>   tok_touch_all:  6
>   tok_touch_all: 29
> 
> new(Mark) tok_touch_all:
>   tok_touch_all: 35
>   tok_touch_all: 40
>   tok_touch_all: 68
>   tok_touch_all: 42
>   tok_touch_all: 33
>   tok_touch_all: 39
>   tok_touch_all: 48
>   tok_touch_all: 33
> 
> new(Bradley) tok_touch_all:
>   tok_touch_all: 413
>   tok_touch_all: 330
>   tok_touch_all: 253
>   tok_touch_all: 525
>   tok_touch_all: 579
>   tok_touch_all: 248
>   tok_touch_all: 329
>   tok_touch_all: 753


Please see my comment below.

I forgot, when I submitted the patch, to include the crucial index that is needed as well:

    create index bayes_token_idx1 on bayes_token(token);
Comment 13 Mark Martinec 2010-06-10 17:35:25 UTC
> I forgot, when I submitted the patch, to include the crucial index that is
> needed as well: 
>     create index bayes_token_idx1 on bayes_token(token);

I had this index in my tests, it was as specified in the docs
(sql/bayes_pg.sql).

Actually I tried both ways: with bayes_token_idx1 and a primary
key on (id,token), and then also without the bayes_token_idx1 and
a primary key on (token,id), as suggested in Bug 4400.

It made no difference on tok_touch_all, but the Bug 4400 suggestion
made a little improvement in tok_get_all.

Both the tok_touch_all and the tok_get_all were as reported by the
timing log (added by my patch). It can be seen in the debug log
with area 'timing', e.g.: spamassassin -D timing -t <test.msg
or added as a header field to a message:
  add_header all Timing _TIMING_

Here are my complete results (the comment 7 just summarized it).
The "old index" implies bayes_token_idx1 as per SA docs,
the "new index" implies the Bug 4400 suggestion. The message
was always the same, but the $atime was faked to always use a
current time, so that tokens were always considered new and
had to be updated.


dbg: bayes: tok_get_all: token count: 3454
dbg: bayes: tok_touch_all: rows=151

old index, original tok_touch_all:
  tok_get_all: 153, tok_touch_all: 33
  tok_get_all: 127, tok_touch_all: 16
  tok_get_all: 115, tok_touch_all:  7
  tok_get_all: 112, tok_touch_all:  7

old index, new(Mark) tok_touch_all:
  tok_get_all:  97, tok_touch_all: 35
  tok_get_all: 124, tok_touch_all: 40
  tok_get_all: 147, tok_touch_all: 68
  tok_get_all: 141, tok_touch_all: 42

old index, new(Bradley) tok_touch_all:
  tok_get_all: 103, tok_touch_all: 413
  tok_get_all: 110, tok_touch_all: 330
  tok_get_all:  96, tok_touch_all: 253
  tok_get_all: 113, tok_touch_all: 525

new index, original tok_touch_all:
  tok_get_all: 197, tok_touch_all: 21
  tok_get_all:  99, tok_touch_all: 19
  tok_get_all: 117, tok_touch_all: 12
  tok_get_all:  97, tok_touch_all:  6
  tok_get_all: 116, tok_touch_all: 12
  tok_get_all: 165, tok_touch_all:  6
  tok_get_all:  98, tok_touch_all: 29

new index, new(Mark) tok_touch_all:
  tok_get_all:  95, tok_touch_all: 33
  tok_get_all:  95, tok_touch_all: 39
  tok_get_all:  96, tok_touch_all: 48
  tok_get_all: 103, tok_touch_all: 33

new index, new(Bradley) tok_touch_all:
  tok_get_all: 111, tok_touch_all: 579
  tok_get_all: 103, tok_touch_all: 248
  tok_get_all: 112, tok_touch_all: 329
  tok_get_all: 101, tok_touch_all: 753
Comment 14 Mark Martinec 2010-06-10 17:44:23 UTC
> Yes, it does create the string on each for the update, but that is exactly
> what it is meant to do. This is to avoid the deadlocks. So it does do the
> update one by one.

A statement handle can be prepared fully in advance, doing so should
not have any effect on locking. Also updating the bayes_vars in each
iteration is just a waste.

It can be argued whether 'UPDATE bayes_token SET atime=...' for all
tokens should be run as a single transaction (as in my patch), or as
individual transactions (as your code does). I'm not sure about locking,
but I can tell that the whole update ran much slower if done with
one-transaction-per-token.
Comment 15 Mark Martinec 2010-06-15 16:02:01 UTC
(In reply to comment #7)
> I did some benchmarking on a smallish database (started it few days ago,
> thanks to enhancement in Bug 6447 it only accumulated 5000 tokens so far),

Again my mistake (as in Bug 4400), that was 5000 messages, not tokens.

Today (a few days later) our database grew to 12000 messages and 220.000
tokens, and I have analyzed our timing reports - while switching from
the suggested tok_touch_all (my patch) back to the original tok_touch_all
at noon. PostgreSQL server is 8.3.11 (from ports, running under FreeBSD).

Tried with both indexing schemes (Bug 4400) - the indexing scheme makes
hardly any difference.

I can conclude that for our setup THE ORIGINAL tok_touch_all() works
significantly faster! I will attach two diagrams to demonstrate it.
For about half of the real-traffic messages the saving is about 30 ms.
Compared to the total time spent in bayes (let alone compared to the
total SA processing) this is not too drastic, but is quite noticeable
nevertheless.

I wonder, did a version of PostgreSQL cause this difference between
our and Bradley's case? Which version was used there?
Comment 16 Mark Martinec 2010-06-15 16:09:47 UTC
Created attachment 4775 [details]
diagram showing elapsed time spent in bayes, vs. time of the day

tok_touch_all() was the one suggested here (my patch) and was running
in the morning (before 12h on the x axis). At noon I switched to the
original tok_touch_all(). The diagram shows a drop in bayes elapsed
times by about 30 ms. The tri-modality does not originate from SQL,
but from number of tokens in a message (i.e. normalizing elapsed time
by tokenization time removes modality).
Comment 17 Mark Martinec 2010-06-15 16:13:46 UTC
Created attachment 4776 [details]
iagram showing elapsed time spent in tok_touch_all, vs. time of the day

This diagram is from the same period as the previous one, but shows
only one component of bayes elapsed time: only the time spent
in tok_touch_all. This again shows about 30 ms drop in elapsed time
for about half of the messages, i.e. the drop in the total bayes
times is entirely attributable to a drop in tok_touch_all.
Comment 18 Mark Martinec 2010-06-15 16:19:14 UTC
Bradley, did you have a chance of trying my proposed patch instead of
yours? Considering that it doesn't use the IN operator in SQL it should
behave much like yours, while factoring out invariant code.

If it doesn't work too well in your case due to transactional locking,
you can try moving the calls to begin_work() and commit() to within
the 'for' loop.
Comment 19 Clayton Keller 2011-02-17 11:11:17 UTC
It looks like this bug has been sitting for a while. I was considering moving Bayes and possibly AWL to PostgreSQL but ran across this and bug 4400. 

Are the possible performance issues that great to not put this into production?

With that, is there any additional testing I may be able to assist with? Unfortunately, outside of production I have somewhat small datasets to work with which may not help with the performance benchmarking needed.
Comment 20 Mark Martinec 2011-02-17 14:37:01 UTC
> It looks like this bug has been sitting for a while. I was considering moving
> Bayes and possibly AWL to PostgreSQL but ran across this and bug 4400. 
> Are the possible performance issues that great to not put this into production?

The current code in trunk (to become 3.4.0) has my last patch incorporated
(comment 9). It implements what Bradley Kieser proposed, but factors out
some invariant code. So, token times are updated one at a time (not using
an IN operator).

In effect, this is in production now (for those of us running the trunk code).
The old routine (sub tok_touch_all_old) is still there in this module
(Mail::SpamAssassin::BayesStore::PgSQL), so it is possible to test each,
just by renaming tok_touch_all to something like tok_touch_all_6444OFF
and tok_touch_all_old to tok_touch_all.

Actually this is what I'm doing here, i.e. using the tok_touch_all_old.
It is faster here for some reason. I will attach a diagram to illustrate it.

Using PostgreSQL 8.3.14 on FreeBSD,
with 230.000 records in table bayes_token:

\d bayes_token

            Table "public.bayes_token"
   Column   |  Type   |         Modifiers          
------------+---------+----------------------------
 id         | integer | not null default 0
 token      | bytea   | not null default ''::bytea
 spam_count | integer | not null default 0
 ham_count  | integer | not null default 0
 atime      | integer | not null default 0

Indexes:
  "bayes_token_pkey" PRIMARY KEY, btree (id, token)
  "bayes_token_idx1" btree (token)

The feature:
  bayes_auto_learn_on_error 1
is of great help here, keeps the growth of the number of tokens
in a database manageable.

The attached diagram shows the b_tok_touch_all elapsed time
for the last couple of hours on our production mailer
(as measured by SpamAssassin and logged by amavisd-new in its
TIMING-SA syslog entries, or by enabling a timing debug in spamd).

At 18h I switched code back to what is in trunk now (i.e. uses
tok_touch_all since 18h, and was using tok_touch_all_old earlier).
It clearly show a substantial increase in elapsed time, although
in absolute terms the change is still small (perhaps 20 ms)
compared to the total elapsed time in SpamAssassin. 

> With that, is there any additional testing I may be able to assist with?
> Unfortunately, outside of production I have somewhat small datasets to work
> with which may not help with the performance benchmarking needed.

It would certainly help to get an independent measurement report.

If it turns out that different sites (versions of PostgreSQL ???)
see one or the other code variant faster, it should be quite easy
to add a configuration setting to chose one or the other.
Comment 21 Mark Martinec 2011-02-17 14:38:39 UTC
Created attachment 4845 [details]
diagram: Time spent in Mail::SpamAssassin::BayesStore::PgSQL::tok_touch_all
Comment 22 Mark Martinec 2011-03-22 21:34:47 UTC
trunk:
  Bug 6444: tok_touch_all update forces full table scan, kills performance.
  - Reverting to original tok_touch_all, which consistently runs
  slightly faster with PostgreSQL 8.3.14 than the proposed alternative.
  This change is just renaming tok_touch_all to tok_touch_all_6444
  and tok_touch_all_old to tok_touch_all, keeping both in place
  in case we change our mind later based on some future benchmarks.
Sending lib/Mail/SpamAssassin/BayesStore/PgSQL.pm
Committed revision 1084451.
Comment 23 Kevin A. McGrail 2011-10-29 05:28:37 UTC
Based on results, recommend closing as wontfix.  Mark?
Comment 24 Mark Martinec 2011-10-29 23:30:23 UTC
> Based on results, recommend closing as wontfix.  Mark?

Fine with me, at least for now (3.4.0).

(the 'wontfix' in this case means that the old routine continues
to be in use as a faster variant (for me) at least for 3.4.0, but
the new proposed one (with refinements) is there and available for
testing (just needs to be renamed), so code has indeed been changed)

As both variants are now in the code, if anyone feels like
benchmarking each, perhaps with different versions of PostgreSQL,
or perhaps with a very large database, then this PR can be
re-opened for further consideration.
Comment 25 Henrik Krohns 2014-02-19 19:22:17 UTC
*** Bug 7014 has been marked as a duplicate of this bug. ***
Comment 26 peter gervai 2014-04-05 09:01:14 UTC
Well I was sent here (by based on the duplicate at the bottom) and yes, this is still problem, and additional problem is that the suggested code to be renamed use a nonexistant SQL function:

2014-04-05 11:00:07 CEST ERROR:  function touch_tokens(integer, unknown, integer) does not exist at character 8

I have tried to find this function but google didn't quite help. Is tok_touch_all_old supposed to work?
Comment 27 peter gervai 2014-04-05 09:47:03 UTC
Sorry, it seems only the latest release contains the patch and I didn't find it in the svn right away. 3.4.x contains the patch.

Still I see a few deadlocks but not that much:

2014-04-05 11:40:55 CEST STATEMENT:  INSERT INTO awl username,email,ip,count,totscore) VALUES  ($1,$2,$3,$4,$5)
2014-04-05 11:41:02 CEST ERROR:  deadlock detected
2014-04-05 11:41:02 CEST DETAIL:  Process 17365 waits for ShareLock on transaction 9747303; blocked by process 17359.
        Process 17359 waits for ShareLock on transaction 9747354; blocked by process 17365.
        Process 17365: UPDATE bayes_token SET atime=$1 WHERE id=$2 AND token=$3 AND atime < $4
        Process 17359: select put_tokens(2, E'{"\\\\000\\\\063\\\\352\\\\012\\\\152","\\\\000\\\\172\\\\344\\\\002\\\\153","\\\\001\\\\004\\\\253\\\\235\\\\161","\\\\001\\\\0....
Comment 28 Kevin A. McGrail 2014-04-07 12:42:43 UTC
(In reply to peter gervai from comment #27)
> Sorry, it seems only the latest release contains the patch and I didn't find
> it in the svn right away. 3.4.x contains the patch.
> 
> Still I see a few deadlocks but not that much:
> 
> 2014-04-05 11:40:55 CEST STATEMENT:  INSERT INTO awl
> username,email,ip,count,totscore) VALUES  ($1,$2,$3,$4,$5)
> 2014-04-05 11:41:02 CEST ERROR:  deadlock detected
> 2014-04-05 11:41:02 CEST DETAIL:  Process 17365 waits for ShareLock on
> transaction 9747303; blocked by process 17359.
>         Process 17359 waits for ShareLock on transaction 9747354; blocked by
> process 17365.
>         Process 17365: UPDATE bayes_token SET atime=$1 WHERE id=$2 AND
> token=$3 AND atime < $4
>         Process 17359: select put_tokens(2,
> E'{"\\\\000\\\\063\\\\352\\\\012\\\\152",
> "\\\\000\\\\172\\\\344\\\\002\\\\153","\\\\001\\\\004\\\\253\\\\235\\\\161",
> "\\\\001\\\\0....

Recommended patches because to me this seems like a slamdunk for InnoDB or a Redis Backup?
Comment 29 peter gervai 2014-04-09 18:23:53 UTC
(In reply to Kevin A. McGrail from comment #28)

> Recommended patches 

For me to be able to provide you with a patch would require a complete analysis of the SQL code, and most probably rewriting large chunks of it, including the PGSQL parts.

> because to me this seems like a slamdunk for InnoDB or a Redis Backup?

It's not a really hard thought excercise to realise and if I wanted to use an inferior system just because there's poor coding in the parts in question then I wouldn't be still here, right? 

I haven't worked with this code and after a *very* brief and ignorant look I only guess that locking wasn't really thought over, since implicit locking seem to conflict on a busy database. I probably still have to look through since I cannot fiddle with it just by poking transactions or explicit locking here and there but it requires time, for me it requires much more than for those familiar with the code, but eventually I'll reach that point. Giving advices with bad intent doesn't really help.

(As a sidenote for me it seems that put_tokens (SQL) is always involved in deadlocks, probably by locking large chunks within a transaction, but I may be wrong.)