Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
Description
As stated in the docs, rewriting an inner join and filtering nulls from inputs can be a big performance gain: http://pig.apache.org/docs/r0.14.0/perf.html#nulls
We would like to add an optimizer rule which detects inner joins, and filters nulls in all inputs:
A = filter A by t is not null;
B = filter B by x is not null;
C = join A by t, B by x;
see also: http://stackoverflow.com/questions/32088389/is-the-pig-optimizer-filtering-nulls-before-joining
Attachments
Attachments
Issue Links
- links to