That's what I planned at start, but decided to leave WriteLineDoc intact because it is general, that is, not aware of the unique structure of Wikipedia data, where some of the pages represent categories.
I think that you misunderstood me, or I wasn't clear enough. WriteLineDoc would not change, EnwikiContentSource would. If someone is interested in creating a line file over all Wikipedia pages, he'll put in his .alg something like content.source=EnwikiContentSource and enwiki.source.exclude.categories=false, otherwise enwiki.source.exclude.categories=true. WriteLineDocTask would still write the DocData that the source writes.
EnwikiContentSource will return either DocData or CategoryDocData, or a single object EnwikiDocData with an extra boolean isCategory. WriteLineDoc will still read just the DocData fields it knows about. WriteEnwikiLineDoc will write the DocData to the relevant file, per isCategory.
Actually I am after the two files
I know . I don't propose anything different, just discussing how the code could be designed to achieve that, and as a bonus, allow someone to exclude from regular benchmarks the category pages.