FSFS allows PLAIN, i.e. non-deltified non-compressed, representations to store the content length as 0 as it matches the on-disk size. Up to and including 1.8.x, there is no restriction on the representation type for that omission.
In practice, however, it can be difficult to decide whether a 0 value represents an omission or an actually empty file: A self-deltified empty file has a length of 0 but an 4 byte on-disk size. When representations are read, their header tells us whether it is indeed a PLAIN or a DELTA representation and that is enough to resolve any ambiguity.
The problem occurs when we omit the length value for file contents and call svn_fs_file_length() on it. FSFS will report the length as 0 and that causes e.g. 'svnadmin dump' to write broken dump files where the skipped / empty contents does not match the checksum.
Up to 1.7.x, we used this omission rule only for hash data, i.e. props and directories, never for file contents. Thus, there is no problem with these Subversion releases. Starting 1.9.0 and up, the "structure" document explicitly restricts the omission to property reps. Furthermore, 1.9+ will not omit length values at all.
3rd party implementations like SVNkit may have produced instances of omitted length values for file contents, though. We need to handle those correctly and extend the API implementation accordingly.
Moreover, 1.8.x generalized the rep sharing mechanism. If a file contents happened to match a property representation, e.g. "END\n", it would now use the property representation. The latter is PLAIN by default and stores a 0 length value in the rep cache. Hence, the file contents rep will also report a 0 length.
Reproduction sketch (requires 1.8.x):
- Create repo. Keep rep sharing on and prop deltification off.
- Add empty file, set prop on file and commit.
- Remove prop on file and commit.
- Set file contents to "END\n" and commit.
- run 'svn ls -v' on the parent folder => file length is shown as 0
The following things need to be fixed:
- Don't omit the length value - even for properties.
That prevents new instances caused by incoming data. (already fixed in 1.9; fix for 1.8 still needed.)
- Update the "structure" document with info when the omission is safe.
- Compare the size and length values of the rep returned by the rep cache with the data of the new rep. Only replace new with old if those match. The prevents new instances caused by the rep cache.
- Fix svn_fs_fs__file_length() to return 0 lengths only for files that are known empty. This will be the actual bug fix.