CouchDB
  1. CouchDB
  2. COUCHDB-326

Occasional {"error":"error","reason":"eacces"} errors deleting a database on Windows

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9
    • Fix Version/s: 1.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Windows, couch 0.9, erlang R12B 5.6.5

    • Skill Level:
      Regular Contributors Level (Easy to Medium)

      Description

      On Windows, occasionally you will see errors attempting to delete a database. This manifests itself by 10-30% of the test suite failing on Windows. If you retry the tests that failed, they will usually pass on subsequent attempts. Running the tests individually causes them to fail roughly 10% of the time.

      The log output shown is:

      [debug] [<0.18650.6>] httpd 500 error response:

      {"error":"error","reason":"eacces"}

      [info] [<0.18650.6>] 127.0.0.1 - - 'DELETE' /test_suite_db/ 500

      A slightly snipped transcript from IRC:

      (2:58:32 PM) markh: I see a number of INFO logs "Shutting down view group server, monitored db is closing." directly before the error. I was guessing the file may be unlink'd before one of those workers actually closes its handle?
      (2:58:54 PM) alisdair: yeah, it's probably a race condition
      (2:59:13 PM) alisdair: where the delete is tried before the fd is let go
      (2:59:26 PM) alisdair: the reader fd that is
      (2:59:32 PM) markh: yeah
      ...
      (3:11:47 PM) alisdair: i can't find an obvious deadlock
      (3:12:18 PM) alisdair: couch_server:delete explicitly waits for the db process to exit
      (3:12:23 PM) alisdair: before deleting it
      (3:15:15 PM) alisdair: i think i found the problem
      (3:15:23 PM) alisdair: but i need a windows machine to confirm
      (3:15:30 PM) alisdair: i'll look into it tomorrow

        Activity

        Randall Leeds made changes -
        Status Open [ 1 ] Closed [ 6 ]
        Fix Version/s 1.0 [ 12313209 ]
        Resolution Fixed [ 1 ]
        Hide
        Randall Leeds added a comment -

        According to dch, this hasn't been reproducible since 1.0.

        Show
        Randall Leeds added a comment - According to dch, this hasn't been reproducible since 1.0.
        Hide
        James Howe added a comment -

        Anti-Virus

        Show
        James Howe added a comment - Anti-Virus
        Hide
        Paul Joseph Davis added a comment -

        @Dave

        Why would an audio/visual program be interfering with CouchDB's data directories?

        Show
        Paul Joseph Davis added a comment - @Dave Why would an audio/visual program be interfering with CouchDB's data directories?
        Hide
        Dave Cottlehuber added a comment -

        Resolved since quite a while I believe. I've not seen any errors in the test suite for this since 1.0, and cannot repro on current 1.1.1.

        Running these in parallel shells on CouchDB 1.1.1 produces 0 errors, and leaves behind no dbs, although a fair bit of laptop heat:

        dave@akai /tmp % while (( i++ < 5000 ))

        { curl --silent -X PUT http://172.16.40.128:5984/db$i ; }


        dave@akai /tmp % while (( i++ < 5000 ))

        { curl --silent -X DELETE http://172.16.40.128:5984/db$i ; }

        Anybody reporting this issue I would strongly advise they ensure that they have AV programs skipping their var/lib/couch/ just in case.

        Show
        Dave Cottlehuber added a comment - Resolved since quite a while I believe. I've not seen any errors in the test suite for this since 1.0, and cannot repro on current 1.1.1. Running these in parallel shells on CouchDB 1.1.1 produces 0 errors, and leaves behind no dbs, although a fair bit of laptop heat: dave@akai /tmp % while (( i++ < 5000 )) { curl --silent -X PUT http://172.16.40.128:5984/db$i ; } dave@akai /tmp % while (( i++ < 5000 )) { curl --silent -X DELETE http://172.16.40.128:5984/db$i ; } Anybody reporting this issue I would strongly advise they ensure that they have AV programs skipping their var/lib/couch/ just in case.
        Paul Joseph Davis made changes -
        Skill Level Regular Contributors Level (Easy to Medium)
        Kenneth LeFebvre made changes -
        Comment [ This may not be a permanent solution, but here's a copy of the change I made to make all the unit tests pass.

        On my workstation, the sleep wasn't enough to consistently succeed for me, and I didn't want to increase the sleep across the board.
        ]
        Kenneth LeFebvre made changes -
        Attachment delete-database.patch [ 12449595 ]
        Kenneth LeFebvre made changes -
        Field Original Value New Value
        Attachment delete-database.patch [ 12449595 ]
        Hide
        Chris McKee added a comment -
        Show
        Chris McKee added a comment - Issues quite well described here http://osdir.com/ml/couchdb-user/2009-10/msg00185.html
        Hide
        Eric Desgranges added a comment -

        I'm running into the same issue with CouchDB 0.11 on WIndows 7. It definitively looks like a race condition when deleting either databases or documents.

        Show
        Eric Desgranges added a comment - I'm running into the same issue with CouchDB 0.11 on WIndows 7. It definitively looks like a race condition when deleting either databases or documents.
        Hide
        michael h added a comment -

        This issue still exists in the windows binary installer version 0.11.0b897093

        Show
        michael h added a comment - This issue still exists in the windows binary installer version 0.11.0b897093
        Hide
        Mark Hammond added a comment -

        Just got some clarification from damien:

        (10:33:56 AM) damienkatz: it waits to get the killed message, but the point isn't to wait for an orderly shutdown, just to clear the message.
        (10:34:16 AM) markh: right - so that receive does do something important...
        (10:34:44 AM) damienkatz: yes, if it didn't, the couch_server process would die.
        (10:35:08 AM) damienkatz: as it would get the message later and not know where it came from, so it assumes something bad happened.

        Show
        Mark Hammond added a comment - Just got some clarification from damien: (10:33:56 AM) damienkatz: it waits to get the killed message, but the point isn't to wait for an orderly shutdown, just to clear the message. (10:34:16 AM) markh: right - so that receive does do something important... (10:34:44 AM) damienkatz: yes, if it didn't, the couch_server process would die. (10:35:08 AM) damienkatz: as it would get the message later and not know where it came from, so it assumes something bad happened.
        Hide
        alisdair sullivan added a comment -

        exit(Pid, kill),
        receive

        {'EXIT', Pid, close}

        -> ok end,

        To achieve a clean shutdown of the db and it's child processes, you need to send it a signal other than kill and give it a chance to shutdown and cleanup it's processes. The receive here doesn't do anything as a killed process sends the 'EXIT' msg immediately upon being killed.

        Show
        alisdair sullivan added a comment - exit(Pid, kill), receive {'EXIT', Pid, close} -> ok end, To achieve a clean shutdown of the db and it's child processes, you need to send it a signal other than kill and give it a chance to shutdown and cleanup it's processes. The receive here doesn't do anything as a killed process sends the 'EXIT' msg immediately upon being killed.
        Mark Hammond created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Mark Hammond
          • Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development