CouchDB
  1. CouchDB
  2. COUCHDB-1003

deleting db file is asynchronous & file rename in couch_file:delete

    Details

    • Type: Question Question
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Database Core
    • Labels:
      None

      Description

      I wonder why we spawn the file deletion when we delete a database. On slow io machine it introduces latency. I don't see any reason we make this deletion asynchronous ?

      About couch_file:delete, we first rename the file before deleting it. Why are we doing that ? Is this for windows ?

        Issue Links

          Activity

          Benoit Chesneau created issue -
          Hide
          Robert Newson added a comment -

          deleting a large database can take time, especially on block-oriented filesystems (ext2/3/4, for example).

          The rename takes the same time regardless of size, the deletion happens asynchronously. On startup, we check the .delete directory for any incomplete delete calls.

          I think it's good behavior and we shouldn't change it. I also think we should revert the sleep you added today to replication.js as we don't understand the real cause of the failure on your machine (it passes here without the wait).

          Show
          Robert Newson added a comment - deleting a large database can take time, especially on block-oriented filesystems (ext2/3/4, for example). The rename takes the same time regardless of size, the deletion happens asynchronously. On startup, we check the .delete directory for any incomplete delete calls. I think it's good behavior and we shouldn't change it. I also think we should revert the sleep you added today to replication.js as we don't understand the real cause of the failure on your machine (it passes here without the wait).
          Hide
          Benoit Chesneau added a comment - - edited

          any link about it ? I wa expecting that filesystems already do this job.
          same time compared to?

          > the deletion happens asynchronously. On startup, we check the .delete

          I didn't ask for a change but reasons. I don't think it's good to not
          check for deletion of a db file. It could introduce potential file
          leaking I don't really like.

          > I also think we should revert the sleep you added today to
          replication.js as we don't understand the real cause of the failure on
          your machine (it passes here without the wait).
          the cause is latency in delete /rename on slow io. This isn't only on
          my machine, btw but on every standard machines. I'm -1 reverting this
          fix right now since it solved it on all machines tested here. Note:
          this isn't the only test where we do apply such fix. Feel free to reopen the ticket anyway.

          Show
          Benoit Chesneau added a comment - - edited any link about it ? I wa expecting that filesystems already do this job. same time compared to? > the deletion happens asynchronously. On startup, we check the .delete I didn't ask for a change but reasons. I don't think it's good to not check for deletion of a db file. It could introduce potential file leaking I don't really like. > I also think we should revert the sleep you added today to replication.js as we don't understand the real cause of the failure on your machine (it passes here without the wait). the cause is latency in delete /rename on slow io. This isn't only on my machine, btw but on every standard machines. I'm -1 reverting this fix right now since it solved it on all machines tested here. Note: this isn't the only test where we do apply such fix. Feel free to reopen the ticket anyway.
          Hide
          Robert Newson added a comment -

          I mean that any rename operation takes the same time as any other (the amount of work required is the same).

          It's trivial to verify that deleting a large file takes time (as every block needs to be marked as free), just make a large file and then delete it.

          It's true that other tests have sleeps and waits in them but since replication.js was working for you until very recently without the sleep I would rather find and treat the cause than the symptom.

          Show
          Robert Newson added a comment - I mean that any rename operation takes the same time as any other (the amount of work required is the same). It's trivial to verify that deleting a large file takes time (as every block needs to be marked as free), just make a large file and then delete it. It's true that other tests have sleeps and waits in them but since replication.js was working for you until very recently without the sleep I would rather find and treat the cause than the symptom.
          Hide
          Benoit Chesneau added a comment -

          what I say is that we should wait for complrtr deletion in my opinion. Whatever the time it take. Delete isn't a common operation and it should be handeld like it. On a more general view spawning an operztion without checking it doesn't finish isn't so good...

          I'm not sure it is recent. Already had such error. Also please reopen the bug if you think it's needed. This one wasn't related.

          Show
          Benoit Chesneau added a comment - what I say is that we should wait for complrtr deletion in my opinion. Whatever the time it take. Delete isn't a common operation and it should be handeld like it. On a more general view spawning an operztion without checking it doesn't finish isn't so good... I'm not sure it is recent. Already had such error. Also please reopen the bug if you think it's needed. This one wasn't related.
          Hide
          Robert Newson added a comment -

          We currently do this, synchronous, on a delete db call;

          1) close the file descriptors
          2) remove the db from the internal ets tables (lru, etc)
          3) rename the file to a different directory and name.
          4) A fire-and-forget process is then spawned to delete the file.

          On startup, couchdb deletes everything in the .deleted directory.

          Spawning a process without waiting for it to complete is perfectly fine if there's nothing to be gained from waiting, as is the case here.

          I don't see what would be achieved by blocking for the deletion like we used to.

          Show
          Robert Newson added a comment - We currently do this, synchronous, on a delete db call; 1) close the file descriptors 2) remove the db from the internal ets tables (lru, etc) 3) rename the file to a different directory and name. 4) A fire-and-forget process is then spawned to delete the file. On startup, couchdb deletes everything in the .deleted directory. Spawning a process without waiting for it to complete is perfectly fine if there's nothing to be gained from waiting, as is the case here. I don't see what would be achieved by blocking for the deletion like we used to.
          Hide
          Randall Leeds added a comment -

          In particular, a synchronous delete would block updates to the new file after compaction completes its swap. See COUCHDB-780 for where I originally proposed the current async behavior.

          Show
          Randall Leeds added a comment - In particular, a synchronous delete would block updates to the new file after compaction completes its swap. See COUCHDB-780 for where I originally proposed the current async behavior.
          Hide
          Benoit Chesneau added a comment -

          Thanks for the link to 780 issue, I couldn't find it. So, the decision was to let a note for .delete files and let them around if something happen. I don't like it so much. But except that, while async delete is enough good for compaction (and needed right now) I don't see why we use it for removing a db.

          To be clear, we are sending a status 200 in response to a db DELETE while we aren't sure the db was deleted. rename != delete. data could still be on the disk or data could simply be not deleted for any reason. Relying on an external script or restart of couchdb to handle final delete isn't so good, this is another thing to monitor and check. In my opinion, until we don't wait for a full delete (which could be needed in a scenario where dbs are crypted) it could be interesting to:

          • send a 202 status instead of 200
          • have a task that collect deletions and eventually log and clean them if needed.
          • what do you think ?
          Show
          Benoit Chesneau added a comment - Thanks for the link to 780 issue, I couldn't find it. So, the decision was to let a note for .delete files and let them around if something happen. I don't like it so much. But except that, while async delete is enough good for compaction (and needed right now) I don't see why we use it for removing a db. To be clear, we are sending a status 200 in response to a db DELETE while we aren't sure the db was deleted. rename != delete. data could still be on the disk or data could simply be not deleted for any reason. Relying on an external script or restart of couchdb to handle final delete isn't so good, this is another thing to monitor and check. In my opinion, until we don't wait for a full delete (which could be needed in a scenario where dbs are crypted) it could be interesting to: send a 202 status instead of 200 have a task that collect deletions and eventually log and clean them if needed. what do you think ?
          Hide
          Filipe Manana added a comment -

          "Note:
          this isn't the only test where we do apply such fix. Feel free to reopen the ticket anyway. "

          Benoît, what other tests do that? The only ones that do a wait loop I can remember, like replicator_db.js, do it to wait for something to happen and not to fix an issue like this one.

          And as for the 2nd file descriptor, it's ref counted like the updater file descriptor. I don't it's the cause for this.

          Show
          Filipe Manana added a comment - "Note: this isn't the only test where we do apply such fix. Feel free to reopen the ticket anyway. " Benoît, what other tests do that? The only ones that do a wait loop I can remember, like replicator_db.js, do it to wait for something to happen and not to fix an issue like this one. And as for the 2nd file descriptor, it's ref counted like the updater file descriptor. I don't it's the cause for this.
          Hide
          Benoit Chesneau added a comment -

          mmm are you speaking about COUCHDB-1002 ? yes maybe we remove some. THough I experience a lot of latency with firefox since post 1.0.

          About this ticket, I really think we should change the http status and make something to track file deletion.

          Show
          Benoit Chesneau added a comment - mmm are you speaking about COUCHDB-1002 ? yes maybe we remove some. THough I experience a lot of latency with firefox since post 1.0. About this ticket, I really think we should change the http status and make something to track file deletion.
          Hide
          Robert Newson added a comment -

          This behavior is intentional and desired.

          If there's a better behavior, we should discuss in @dev or IRC and then raise tickets focused on making those changes. For example, we could discuss whether 202 is more appropriate status code and then a file of ticket to change that.

          Show
          Robert Newson added a comment - This behavior is intentional and desired. If there's a better behavior, we should discuss in @dev or IRC and then raise tickets focused on making those changes. For example, we could discuss whether 202 is more appropriate status code and then a file of ticket to change that.
          Robert Newson made changes -
          Field Original Value New Value
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Won't Fix [ 2 ]
          Benoit Chesneau made changes -
          Link This issue is superceded by COUCHDB-1020 [ COUCHDB-1020 ]
          Benoit Chesneau made changes -
          Link This issue is superceded by COUCHDB-1019 [ COUCHDB-1019 ]

            People

            • Assignee:
              Unassigned
              Reporter:
              Benoit Chesneau
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development