Joshua asked what random file generation mean, as per this sentence from the design doc:
2. Randomly chooses a file name. File names are enumerated, so choosing a file means choosing its sequence number, which defines the entire file path.
I mean by this that we have a static enumeration of files. We choose a random number, and then calculate a full path for the corresponding file using that number.
The static enumeration is like a heap structure. We have an array f0, f1, f2, ... There is a root r. The root's children are files f0 and f1. And two directories d0 and d1. The children of d0 are the files f2, f3 (and the directories d2, d3). The children of d1 are the files f4, f5 as well as the directories d4, d5. And so on. This provides 2 files per directory.
We can generalize it to p files per directory for a fixed p. Here the root's children will be p files f0,...,f(p-1) and p directories d0,...,d(p-1). And so on. Importantly if you have a file fz, then it's parent is always the directory dz', where z' = z/p - 1.
I don't want to use long numbers for file names. So within a directory its child files are named file_i and sub-directories are named dir_i for i = 0,...p-1.
Then given a number z the path of file fz is calculateed recursively. File name of fz is file_(z%p). Its parent is the directory dz', where z' = z/p - 1, and the name of dz' is dir_(z'%p). Going further up the tree while the the indexes are positive.
In the test we choose a random z and build a path out of it. If the operation is create we create a file with this path. In HDFS all missing directories along the path will be created automatically. If fz already exists the create fails.
For read we do the same, but the operation fails if the file does not exist.
Similar approach is used in class FileNameGenerator.