Navis I am no expert on the MapJoinProcessor. Following is what I see; I will need to spend more time on this.
Maybe from my comments you can see the issue.
1. The Plan generated at genPlan is:
TS (the scan for b) has 2 child operators: [RS, RS]
These are for the joins for each of the SubQuery expressions:
from src a
where b.value = a.value and a.key > '9'
b.key not in ( select key from src s1 where s1.key > '2')
The plan looks complex because the handling of not in requires the null check. This issue will occur even if the second insert is a 'in' subquery predicate. It will be easier to follow for such an e.g.
2. With set hive.auto.convert.join=false
The second RS gets converted to a FileSink. You can observe this from the explain output. A subsequent Stage reads this intermediate output to perform the processing for the 2nd SubQuery.
3. With set hive.auto.convert.join=true;
When it comes to CommonJoinResolver the TS has children [RS, FS] ie the 2nd ReduceSink is converted to a FileSink
The MapJoinProcessor:genMapJoinLocalWork line 145 it is assuming that a TableScanOp can only have 1 child.
The fix maybe to ignore any FileSink operators that are children of TableScan. Another test to add is a multi insert on 3 tables.