This gets tricky now (if I understand correctly), consider the sequence (please note the following is keeping in mind the current way of appending source_tag to end)
1. A = load 'input' using PigStorage('\t', '-tagsource');
2. B = FOREACH A GENERATE (int)$0 as col1, (long)$1 as col2, (chararray)$2 as source_tag;
3. C = GROUP B BY source_tag;
n. STORE N INTO 'intermediate' using PigStorage('\t', '-schema');
Now, the user wants to read 'intermediate' using schema, and also know the (new) source path(s).
--Mentioning -schema is not required here, included just for clarity.
A = load 'intermediate' using PigStorage('\t', '-schema -tagsource');
There would be a conflict in auto-loading 'source_tag' in the above case. I think de-coupling 'schema' from 'tagsource' would be a nice alternative, as the input path is not part of "real data". It is a derived field which could be treated differently from actual data contained within input files. So the user always expects the right schema for first n-1 columns, with nth column being the source_tag for which schema does not really need to be auto-loaded? Similar to how it would work if one had extended PigStorage to implement source tagging.
I completely agree with you on the pain of appending source_tag to the end, its less predictable than at the start. However, things would get complicated in terms of maintainence when users want to switch between using and not using source tagging. It would be great to minimize reference repositioning changes for production jobs (error-prone, might result in large number of script changes if fields are not referenced via an alias).
Lastly, I am leaning towards the 'append' approach but fine with either one. We just need to make sure this is an easy to use/adopt feature.