SA Bugzilla – Bug 3790
spamd fails to remove bayes.lock file?
Last modified: 2020-10-22 09:45:28 UTC
This report refers to 3.0 rc3, but I didn't find any mention of a potential fix since then. On at least two occasions now I've invoked sa-learn explicitly on a message that passed spamd, while spamd was still actively processing other messages. In this instance I would expect sa-learn to wait until spamd finished, then lock the database and proceed. Instead, sa-learn goes into a loop calling lstat() on ~/.spamassassin/bayes.lock. The file remains in the directory even after all spamd processing has completed. Only after I manually remove the file does sa-learn recover and proceed. This has never happened if I wait until all the preforked spamds are quiescent before attempting to run sa-learn, so I suspect that there's some other interaction between spamd and sa-learn that causes spamd to fail to unlink its lock file, but I can't imagine what.
are you sure that the lock file isn't valid because spamd is trying to do something with the db? the lock may hang around for a while, especially if there's an expire going on.
moving this bug out of 3.1.0 queue until there's some kind of test case for it...
there's been no movement on this ticket in ages so I'm going to close. if it's still an issue w/ a recent (3.1/etc) SA version, please let us know and we can reopen.
This still happens with 3.0.6, though more rarely. When the lock file is left benhind, "lsof +D ~/.spamassassin" confirms that nothing is accessing any file in that directory, except the process waiting for the lock to be released (which is holding open the temp file that will eventually be renamed to the lock file). I'm content with this remaining closed until I have a chance to install 3.1.3 and check it again.
I've just had this happen again, with schaefer[651] spamassassin -V SpamAssassin version 3.1.3 running on Perl version 5.8.5 Here's my processlist: schaefer[576] ps ax | grep spam 28133 ? Ss 0:01 /usr/bin/spamd -d -c -m5 -L -r /var/run/spamd.pid 28136 ? S 0:08 spamd child 28137 ? S 0:00 spamd child 32632 pts/1 S 0:01 /usr/bin/perl -T -w /usr/bin/sa-learn --spam --single 32638 pts/1 S+ 0:00 grep spam The sa-learn, 32632, is stuck waiting for a lock file to be removed: schaefer[575] ls -al ~/.spamassassin total 7216 drwxrwxr-x 2 schaefer schaefer 4096 Jul 5 00:07 ./ drwx------ 64 schaefer schaefer 4096 Jul 5 00:01 ../ -rw------- 1 schaefer schaefer 315392 Jul 5 00:01 auto-whitelist -rw------- 1 schaefer schaefer 13512 Jul 5 00:01 bayes_journal -rw-rw-rw- 1 schaefer schaefer 29 Jul 5 00:01 bayes.lock -rw-rw-rw- 1 schaefer schaefer 348 Jul 5 00:08 bayes.lock.torch.brasslantern.com.32632 -rw------- 1 schaefer schaefer 327680 Jul 5 00:01 bayes_seen -rw------- 1 schaefer schaefer 10145792 Jul 5 00:01 bayes_toks -rw------- 1 schaefer schaefer 1813 Nov 28 2004 user_prefs However, no process holds bayes.lock; only sa-learn is accessing that directory: schaefer[573] lsof +D ~/.spamassassin COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME sa-learn 32632 schaefer 3r REG 8,2 10145792 7896984 /home/schaefer/.spamassassin/bayes_toks sa-learn 32632 schaefer 4r REG 8,2 327680 7896985 /home/schaefer/.spamassassin/bayes_seen sa-learn 32632 schaefer 5w REG 8,2 290 7896696 /home/schaefer/.spamassassin/bayes.lock.torch.brasslantern.com.32632 The sa-learn will spin forever until I manually remove bayes.lock, after which it completes successfully.
Correction: sa-learn does not spin forever; eventually it prints this: Learned tokens from 1 message(s) (1 message(s) examined) bayes: cannot open bayes databases /home/schaefer/.spamassassin/bayes_* R/W: lock failed: File exists Even though it says it didn't open the database R/W, subsequently re-running it on the same message says it didn't learn anything new, so something was written somewhere. And the bayes_journal file is gone, though unfortunately I forgot to check that before re-running sa-learn so I don't know which pass sync'd it.
In my case - SHORTCIRCUIT plugin is involved. When spamd processing message with shortcircuit rule and flushing bayes_jornal (when max size reached) - there is no unlock event in debug log. SpamAssassin version 3.3.1 running on Perl version 5.10.0
Am also suffering from this problem.If I run lsof +D /.spamassassin no process is running that uses lock files that are still present. Bug causes following error message in maillog: Nov 9 18:23:23 mail spamd[484]: bayes: cannot open bayes databases ~/.spamassassin/bayes_* R/W: lock failed: File exists On a regular basis. Am using spamassassin v3.3.1 on centos v5.7
(In reply to comment #8) > Am also suffering from this problem.If I run lsof +D /.spamassassin no process > is running that uses lock files that are still present. Bug causes following > error message in maillog: > > Nov 9 18:23:23 mail spamd[484]: bayes: cannot open bayes databases > ~/.spamassassin/bayes_* R/W: lock failed: File exists > > On a regular basis. > > Am using spamassassin v3.3.1 on centos v5.7 Btw would like to add that switching to mysql or postgres bayes will solve problem.
This problem seems to persist in version 3.4.0, to the extent that the service can be stopped entirely and the lock file remains until removed manually.
As an example, was running spamd in the foreground in debug mode, now shut down. The user local lock file remains (and has been this way for the last half hour, whilst the daemon was still running and processing mail, and now that it and its children have closed down): -rw-------. 1 raj raj 97008 Oct 6 12:27 /home/raj/.spamassassin/bayes_journal -rw-------. 1 raj raj 36 Nov 7 15:50 /home/raj/.spamassassin/bayes.lock -rw-------. 1 raj raj 20893696 Nov 7 15:50 /home/raj/.spamassassin/bayes_seen -rw-------. 1 raj raj 5328896 Nov 7 15:50 /home/raj/.spamassassin/bayes_toks
What OS ? Are you using Selinux or something similar? Have you checked permissions? What bayes backend are you using? What kind of hardware resources are available? Does it happen during expiration?
No usable test case provided in almost 4 years. Cannot reproduce the problem with 3.4.1, no point leaving this open. Anyone reopening this should answer AXB's questions and provide the full spamd command line.
(In reply to Bill Cole from comment #13) > No usable test case provided in almost 4 years. Cannot reproduce the problem > with 3.4.1, no point leaving this open. > > Anyone reopening this should answer AXB's questions and provide the full > spamd command line. >What OS ? Fedora 29 4.19.12-301.fc29 >Are you using Selinux or something similar? no >Have you checked permissions? yes, but notice that some permissions of the lock files below are different from each other >What bayes backend are you using? out of the box, i.e., not MySQL >What kind of hardware resources are available? ProLiant DL180 G6 48GB RAM CPU(s): 8 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 2 >Does it happen during expiration? Just noticed over 1300 of these in 48 hours; they survived reboots as well as upgrade from Fedora 28. spamd[13844]: bayes: cannot open bayes databases /home/spamd/bayes_* R/W: lock failed: File exists I just noticed all of these: -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.10027 -rw-rw-rw- 1 spamd spamd 196 Jan 16 13:23 bayes.lock.ourdomain.10298 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.10341 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.10368 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.10542 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.10543 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.10944 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.10972 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.11311 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.12492 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.12630 -rw-rw-rw- 1 spamd spamd 112 Jan 16 13:23 bayes.lock.ourdomain.13736 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.14680 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.14706 -rwx------ 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.1502 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.15184 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.15255 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.15927 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.15928 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.16021 -rwx------ 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.16388 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.16653 -rw-rw-rw- 1 spamd spamd 28 Feb 13 2018 bayes.lock.ourdomain.18307 -rwx------ 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.18373 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.18425 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.18659 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.18740 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.19267 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.19317 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.21271 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.21297 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.21695 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.2217 -rwx------ 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.22418 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.22462 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.24441 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.24915 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.2528 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.2530 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.2531 -rwx------ 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.25495 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.25752 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.26171 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.26874 -rwx------ 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.28108 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.28135 -rwx------ 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.28245 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.28951 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.29077 -rwx------ 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.29430 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.29702 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.30062 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.30505 -rw-rw-rw- 1 spamd spamd 0 Feb 12 2018 bayes.lock.ourdomain.31253 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.3661 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.5113 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.5661 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.6417 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.6628 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.7585 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.762 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.7868 -rw-rw-rw- 1 spamd spamd 0 Feb 13 2018 bayes.lock.ourdomain.9929
Forgot to add command line options to spamd: sa-milt 13819 0.0 0.0 161544 3848 ? Ssl 13:23 0:00 /usr/sbin/spamass-milter -p /var/run/spamass-milter/spamass-milter.sock root 16493 0.4 0.2 230452 115676 ? Ss 13:32 0:04 /usr/bin/perl -T -w /usr/bin/spamd -c -m5 -H -u spamd -x --ipv4 --virtual-config-dir=/home/spamd
I'm hitting this behaviour if I enable the shortcircuit plugin, which is the default with SpamHaus recommended configs from https://github.com/spamhaus/spamassassin-dqs The exact error message is: bayes: cannot open bayes databases /%PATH%/.spamassassin/bayes_* R/W: lock failed: Interrupted system call Here's the process and fd situation at the time: # ps axfuww|grep spam root 11325 0.0 0.5 189900 102632 ? SNs Oct13 0:16 /usr/bin/perl -T -w /usr/sbin/spamd --create-prefs --max-children 6 --helper-home-dir --socketpath=/tmp/spamd.sock --vpopmail --username=vpopmail --syslog=local5 -d --pidfile=/var/run/spamd.pid vpopmail 1877 0.7 0.7 248604 157164 ? SN 18:01 1:01 \_ spamd child vpopmail 2142 0.2 0.6 227440 139140 ? SN 20:12 0:02 \_ spamd child # ls -la /proc/1877/fd ... lrwx------ 1 root root 64 Oct 14 19:22 12 -> /home/vpopmail/sa-bayes-db/home/vpopmail/domains/0/colcrock.user/eeveem/.spamassassin/bayes.mutex # cat /proc/1877/fdinfo/12 ... lock: 1: FLOCK ADVISORY WRITE 1877 fd:11:89801696 0 EOF And here's spamd debug log: Oct 14 14:51:49 serverX spamd[1248]: spamd: accept() on fd 5 Oct 14 14:51:49 serverX spamd[1248]: spamd: got connection over /tmp/spamd.sock Oct 14 14:51:49 serverX spamd[1248]: bayes: learner_new self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x14a666070c30), bayes_store_module=Mail::SpamAssassin::BayesStore::DBM Oct 14 14:51:49 serverX spamd[1248]: bayes: learner_new: got store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x14a66aef5028) Oct 14 14:51:49 serverX spamd[1248]: bayes: tie-ing to DB file R/O /home/vpopmail/sa-bayes-db/home/vpopmail/domains/wahine.user/nick/.spamassassin/bayes_toks Oct 14 14:51:49 serverX spamd[1248]: bayes: tie-ing to DB file R/O /home/vpopmail/sa-bayes-db/home/vpopmail/domains/wahine.user/nick/.spamassassin/bayes_seen Oct 14 14:51:49 serverX spamd[1248]: bayes: found bayes db version 3 Oct 14 14:51:49 serverX spamd[1248]: bayes: DB journal sync: last sync: 1602689761 Oct 14 14:51:49 serverX spamd[1248]: bayes: opportunistic call found journal sync due Oct 14 14:51:49 serverX spamd[1248]: bayes: bayes journal sync starting Oct 14 14:51:49 serverX spamd[1248]: locker: mode is 384 Oct 14 14:51:49 serverX spamd[1248]: locker: safe_lock: created /home/vpopmail/sa-bayes-db/home/vpopmail/domains/wahine.user/nick/.spamassassin/bayes.mutex Oct 14 14:51:49 serverX spamd[1248]: locker: safe_lock: trying to get lock on /home/vpopmail/sa-bayes-db/home/vpopmail/domains/wahine.user/nick/.spamassassin/bayes with 10 timeout Oct 14 14:51:49 serverX spamd[1248]: locker: safe_lock: link to /home/vpopmail/sa-bayes-db/home/vpopmail/domains/wahine.user/nick/.spamassassin/bayes.mutex: link ok Oct 14 14:51:49 serverX spamd[1248]: bayes: tie-ing to DB file R/W /home/vpopmail/sa-bayes-db/home/vpopmail/domains/wahine.user/nick/.spamassassin/bayes_toks Oct 14 14:51:49 serverX spamd[1248]: bayes: tie-ing to DB file R/W /home/vpopmail/sa-bayes-db/home/vpopmail/domains/wahine.user/nick/.spamassassin/bayes_seen Oct 14 14:51:49 serverX spamd[1248]: bayes: found bayes db version 3 Oct 14 14:51:49 serverX spamd[1248]: locker: refresh_lock: refresh /home/vpopmail/sa-bayes-db/home/vpopmail/domains/wahine.user/nick/.spamassassin/bayes.mutex Oct 14 14:51:49 serverX spamd[1248]: locker: refresh_lock: refresh /home/vpopmail/sa-bayes-db/home/vpopmail/domains/wahine.user/nick/.spamassassin/bayes.mutex Oct 14 14:51:49 serverX spamd[1248]: locker: refresh_lock: refresh /home/vpopmail/sa-bayes-db/home/vpopmail/domains/wahine.user/nick/.spamassassin/bayes.mutex Oct 14 14:51:49 serverX spamd[1248]: locker: refresh_lock: refresh /home/vpopmail/sa-bayes-db/home/vpopmail/domains/wahine.user/nick/.spamassassin/bayes.mutex Oct 14 14:51:49 serverX spamd[1248]: bayes: synced databases from journal in 0 seconds: 3596 unique entries (4296 total entries) Oct 14 14:51:49 serverX spamd[1248]: bayes: bayes journal sync completed Oct 14 14:51:49 serverX spamd[1248]: spamd: running as uid 502 Oct 14 14:51:49 serverX spamd[1248]: spamd: checking message <0.0.0.41.1D6A259278E2C40.2E5EA1@mail.trkcgfd.work> for nick@wahine.user:502 # here should be the unlocking; see bellow Oct 14 14:51:49 serverX spamd[1248]: shortcircuit: s/c spam due to RCVD_IN_ZEN_LASTEXTERNAL, using score of 100 Oct 14 14:51:49 serverX spamd[1248]: spamd: identified spam (124.9/5.0) for nick@wahine.user:502 in 0.3 seconds, 7051 bytes. Oct 14 14:51:49 serverX spamd[1248]: spamd: result: Y 124 - RCVD_IN_PSBL,RCVD_IN_ZEN_LASTEXTERNAL,SHORTCIRCUIT,SH_DBL_HEADERS,URIBL_CSS,URIBL_CSS_A,URIBL_DBL_SPAM,URIBL_ZRD scantime=0.3,size=7051,user=nick@wahine.user,uid=502,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=/tmp/spamd.sock,mid=<0.0.0.41.1D6A259278E2C40.2E5EA1@mail.trkcgfd.work>,autolearn=disabled,shortcircuit=spam Disabling Mail::SpamAssassin::Plugin::Shortcircuit in sh.pre resolves the problem. Without shortcircuit things look like this: Oct 14 15:02:52 serverX spamd[1248]: spamd: running as uid 502 Oct 14 15:02:52 serverX spamd[1248]: spamd: checking message <1113.927824369.202010141902431022559.0033804522@e.rh.com> for nick@wahine.user:502 Oct 14 15:02:52 serverX spamd[1248]: bayes: tie-ing to DB file R/O /home/vpopmail/sa-bayes-db/home/vpopmail/domains/wahine.user/nick/.spamassassin/bayes_toks Oct 14 15:02:53 serverX spamd[1248]: bayes: tie-ing to DB file R/O /home/vpopmail/sa-bayes-db/home/vpopmail/domains/wahine.user/nick/.spamassassin/bayes_seen Oct 14 15:02:53 serverX spamd[1248]: bayes: found bayes db version 3 Oct 14 15:02:53 serverX spamd[1248]: bayes: DB journal sync: last sync: 1602702172 Oct 14 15:02:53 serverX spamd[1248]: bayes: corpus size: nspam = 23980, nham = 34868 Oct 14 15:02:53 serverX spamd[1248]: bayes: tokenized body: 461 tokens Oct 14 15:02:53 serverX spamd[1248]: bayes: tokenized uri: 929 tokens Oct 14 15:02:53 serverX spamd[1248]: bayes: tokenized invisible: 323 tokens Oct 14 15:02:53 serverX spamd[1248]: bayes: tokenized header: 111 tokens Oct 14 15:02:53 serverX spamd[1248]: bayes: score = 2.77555756156289e-16 Oct 14 15:02:53 serverX spamd[1248]: bayes: DB journal sync: last sync: 1602702172 Oct 14 15:02:53 serverX spamd[1248]: bayes: DB expiry: tokens in DB: 134983, Expiry max size: 150000, Oldest atime: 1599401606, Newest atime: 1602702125, Last expire: 1602158934, Current time: 1602702173 Oct 14 15:02:53 serverX spamd[1248]: bayes: untie-ing Oct 14 15:02:53 serverX spamd[1248]: bayes: files locked, now unlocking lock Oct 14 15:02:53 serverX spamd[1248]: locker: safe_unlock: unlocked /home/vpopmail/sa-bayes-db/home/vpopmail/domains/wahine.user/nick/.spamassassin/bayes.mutex Oct 14 15:02:56 serverX spamd[1248]: spamd: clean message (2.5/5.0) for nick@wahine.user:502 in 3.5 seconds, 21580 bytes. Oct 14 15:02:56 serverX spamd[1248]: spamd: result: . 2 - BAYES_00,DCC_CHECK,DIGEST_MULTIPLE,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,PYZOR_CHECK,SPF_HELO_NONE scantime=3.5,size=21580,user=nick@wahine.user,uid=502,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=/tmp/spamd.sock,mid=<1113.927824369.202010141902431022559.0033804522@e.rh.com>,bayes=0.000000,autolearn=no autolearn_force=no,shortcircuit=no
It looks like this problem is triggered when doing shortcircuit & journal flush. I.e.: * no shortcircuit + journal flush = ok * shortcircuit + no journal flush = ok * shortcircuit + journal flush = causes the bug, but no error message * any + journal flush = we see the bug, because the lock file is held opened