Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.9
-
None
-
None
-
Windows, couch 0.9, erlang R12B 5.6.5
-
Regular Contributors Level (Easy to Medium)
Description
On Windows, occasionally you will see errors attempting to delete a database. This manifests itself by 10-30% of the test suite failing on Windows. If you retry the tests that failed, they will usually pass on subsequent attempts. Running the tests individually causes them to fail roughly 10% of the time.
The log output shown is:
[debug] [<0.18650.6>] httpd 500 error response:
{"error":"error","reason":"eacces"}[info] [<0.18650.6>] 127.0.0.1 - - 'DELETE' /test_suite_db/ 500
A slightly snipped transcript from IRC:
(2:58:32 PM) markh: I see a number of INFO logs "Shutting down view group server, monitored db is closing." directly before the error. I was guessing the file may be unlink'd before one of those workers actually closes its handle?
(2:58:54 PM) alisdair: yeah, it's probably a race condition
(2:59:13 PM) alisdair: where the delete is tried before the fd is let go
(2:59:26 PM) alisdair: the reader fd that is
(2:59:32 PM) markh: yeah
...
(3:11:47 PM) alisdair: i can't find an obvious deadlock
(3:12:18 PM) alisdair: couch_server:delete explicitly waits for the db process to exit
(3:12:23 PM) alisdair: before deleting it
(3:15:15 PM) alisdair: i think i found the problem
(3:15:23 PM) alisdair: but i need a windows machine to confirm
(3:15:30 PM) alisdair: i'll look into it tomorrow