On thinking more about this, I think we should change the test for input3_limit.q to make it deterministic. Instead of
INSERT OVERWRITE TABLE T2 SELECT a.key, a.value from T1 a LIMIT 20;
We should have
INSERT OVERWRITE TABLE T2 SELECT * FROM (SELECT * FROM T1 DISTRIBUTE BY key SORT BY key, value) T LIMIT 20;
SELECT * FROM T2
That would ensure that the rows that we get are top 20 rows.
The order in which these files are processed are controlled by hadoop, so I don't think we have much control on whether kv1.txt gets processed first or kv2.txt.