Good review Paul, thanks!
Let me quote and reply inline to each of your points:
> This patch looks like its a mishmash of a couple patches that are applied slightly differently. Not sure if that's a branch thing or what. But for instance, there's a lot of id_btree(Db) -> Db#db.couch type changes.
These are deliberate. Currently we are maintaining an Fd/couch_file pair for reads and writes in a bit of a clumsy way in the #btree.fd record and the couch_file module and do a switcheroo for writes vs. reads. Now all of that is moved into couch_file and couch_db doesn't have to concern itself with the details of efficient file access.
> There's also a switch between couch_file:sync(Filepath) and couch_file:sync(Fd). I'm not sure if that's on purpose or not because we switched to Filepath on purpose. I see it as being possible but I don't see a reference to it.
We used to pass in the filepath to allow the fsync() to be async so it wouldn't block any readers or writers. That behaviour is now obsolete as couch_file will now pass the fsync() request to it's writer child process.
> I'm really not a fan of exposing the inner workings of couch_file to every client as can be seen by the massive number of flush calls that this creates. I find it quite likely that we'll eventually end up missing one of these flush calls and causing a nasty bug because the error would look like data hadn't been written. I'd want to see things rewritten so that the usage of couch_file doesn't change. Easiest solution I see would be to automatically flush buffers if a read request comes in for something that's not on disk.
Good point, API-wise this is a bit leaky and we should definitely look into making this better, but I don't think this blocks the bulk of the improvements that this patch introduces. I'm happy to open a new ticket about this. Good starting points are calling flush() inside couch_file 1) after writing a header and 2) if a read fails with
(fwiw, Damien tried the latter before but removed it again, the details elude me).
> I'm pretty sure we talked about this at one point, but can someone describe the differences between this and Erlang's delayed_write mode for files?
The difference is that delayed_write does buffering whereas we just want to keep writing to the file all the time. delayed_write would not write until the buffer is full or the timeout is hit. In addition, we wouldn't be able to tell for delayed_commits=false writes when data hit the file reliably.
> couch_file becomes much more complicated with this patch. I'd prefer to see it simplified if at all possible. I'm not entirely certain how it'd look but I might start by making a couch_file_reader and couch_file_writer gen_servers instead of having bare processes in couch_file.
The whole point of this patch is to make couch_file smarter and move the reader/writer abstraction further down the stack and reap the associated performance benefits. Unless we have an alternate patch, we can't really compare this.
> I have to say that this patch scares me a bit. If we make the switch to something like this then we're opening up a whole new type of failure scenario where file "writes" appear to succeed but then end up failing after the caller has moved on. While there are certainly cases where I can see this being a fine tradeoff (view updaters, compactors, etc) it worries me a bit for the main database file. The fact is that I can't (yet) reasonably consider all of the possible new failure modes and convince myself that things are correct. I've already seen some super weird reactions to things like running out of disk space, it seems like not knowing about such an error until you've "written" a megabyte of data seems awkward.
This doesn't change the error scenarios. We already rely on monitoring to tell the request process a couch_file write didn't work:
https://github.com/fdmanana/couchdb/blob/master/src/couchdb/couch_db.erl#L819 ff. (825 in particular)