Haven't looked at the code yet - I'm commenting based on the design doc.
Comment on section 2.2.1 in the design doc.
One thing to be wary of the NOT-IN to LOJ with null check transformation is that it is only valid if the subquery's output is known to be non-nullable.
Example, consider two tables:
Now consider this uncorrelated NOT-IN subquery:
select * from T1 where T1.x not in (select y from T2)
Should produce (if I remember my SQL semantics correctly), an empty result i.e. 0 rows.
This is because "not in" is equivalent to <> ALL. If the inner subquery produces a null, then its comparison with anything always produces null. Therefore, <> ALL check fails for every x from T1. Therefore, the result is empty.
According to section 2.2.1, the transformation to LOJ with null check will look like this:
select * from T1 Left Outer Join (select y from T2) sq1 on (T1.x=sq1.y) where sq1.y is null
The LOJ will produce rows like this:
Applying the null filter after the LOJ produces two rows. This is an incorrect result. If this case is already covered in your implementation, please ignore the comment.