Details
Description
Our repo has 31k files, a total of 120 MB. Linux checkout takes 2 minutes, Windows checkout takes 37 minutes (on only a little slower machine). While checking out on Windows (Windows 7 x64, svn 1.6.12, NTFS), svn eats 50% CPU (that is, one of the two cores). When a large file is being downloaded, the load drops. It clearly showed me that there is a per-file problem (instead of throughput, bandwidth limit, some kind of conversion [we made even test to check if the CR -> CR/LF conversion takes too much time], etc.) Now I've fired up ProcMon from SysInternals. Here are some bottlenecks I've found: 1. Anytime an "entries" is read, I see the following sequence: Open, Read 80 bytes, Close, Open, Read 80 bytes, Close, Open, Read whole file, Close. What is the reason behind this? 2. I've also found that the same "entries" file is being read several times (in the above way) consecutively, without any writes to that file, without any other operations between the two queries. So O, R80, C, O, R80, C, O, Rall, C, O, R80, C, O, R80, C, O, Rall, C, etc. 3. In some directories I see a loop. Svn tries to create a file "tempfile.tmp" and gets NAME COLLISION result. "tempfile.2.tmp" is tried then with the same result. And so on. Sometimes going up to even "tempfile.340.tmp". Seems some DeleteFile is missing for the temporaries. But why not use the GetTempFileName function anyway? 4. When a large file is being checked out, I see the following sequence: Write 4k from offs 0, Write 4k from offs 4k, Write 4k from offs 8k, Read 16k from offs 16k, <- Why? Write 4k from offs 12k, Write 4k from offs 16k, etc. It also shows that either the TCP packet size is set to 4096 bytes, or the file buffer size is set to this silly small value in svn. 5. It seems that the "entries" in the whole directory tree is checked for each repository file. Say file "root/dirA/dirB/fileC" is processed, and both dirA and dirB is already created. svn checks "root/entries", "root/dirA/entries", "root/dirA/dirB/entries", deals with the file, the checks (reads!) "root/dirA/dirB/entries" again, then "root/dirA/entries" and "root/entries". All in all: Most of the lost time is spent with the "entries" files. I'm willing to check any tests or test versions sent to me. PS: The OS field of the issue form should now include Windows 7 as well...
Original issue reported by yogurt2