Yes you could create a different kind of test that doesn't hold out any data to find this reach figure. I don't think it's worth a whole different test class just for this. The entire test framework is only valid insofar as you run it on enough data, with enough to train, that the result reflects how the full system works. So I think it's as valid as anything else to run on the training data only.
Regarding the "2@" prefs heuristic: it's not really a question of the recommender deciding not to recommend. It's that it will always recommend as much as possible, up to what you ask for. But if the test is based on so little data to begin with, the result is not very meaningful. If I am figuring precision@5 and the user has only 4 prefs, what can I do? I can't even call all 4 "relevant" items since it would leave no training data. Even if I did, there would be no way to achieve 100% precision as there are only 4 relevant items. I (arbitrarily) picked 2@ as the minimum – 10 here if @=5 – since you can select 5 of the 10 in this case as relevant, and have as many available for training.
You would not want to drop a user's result just because it recommended 3 items in a test @5. That's a perfectly valid result (given the condition in the preceding paragraph) to include. You can still decide how many of those 3 are relevant, and how many of the relevant items are in those 3.
Precision and recall are not the same in general. If the number of items deemed relevant is equal to "@", then precision will equal recall, yes. And that is usually true for data with ratings, the way this class works. It will just choose some "@" of the items, as there is no basis to call one more relevant than the other. Choosing that many is also somewhat arbitrary; it can't be 0, and can't be all items (or there would no training data from the user under test), so that looked like a nice round number.