Right now, we're not performing so well with large files.
When the client sends a large file to the server, it's generating svndiff data
by comparing the file against either a text-base or an empty-file. On a 1GB
file, this takes a long time; after the data is fully generated, it's *then*
marshalled over the network to the server. (And once the commit succeeds, the
server then deltifies the *old* version of the file, causing the client to time
out! But that's a separate problem, see issue 1573.)
There are a number of problems here:
* First, we're talking about a classic tradeoff between CPU and network
optimization. Lots of CPU-time to discover a possibly small commit diff, with
the reward of a tiny network transmission... or near-zero-CPU-time to just
shove the whole fulltext over the network, which is a huge transmission. Which
will it be? Cmpilato points out that this exactly the sort of choice cvs users
face when they pass the optional -zN flag... perhaps we should grow a similar
option.
* Second, we're taking a performance hit from neon's API, which is still a
"pull" interface. That is, neon insists on pulling data from the caller when
performing a PUT request. As a result, we're first spewing all the svndiff
datat into a tmpfile, *then* invoking the PUT request. It would be much faster
if we could start the PUT request and then "push" svndiff data over the network
as we generate it. Maybe it's worth adding this feature to neon?
* As an aside: note that there are at least two different ways to "shove
fulltext" over the network. One method involves adding an apply_text() function
to the editor API, to complement our apply_txdelta(). (This is what svn_fs.h
has currently.) Another method is to just rapidly fabricate 'add' vdelta windows.