Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4662

New optimizer rule: filter nulls before inner joins

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • 0.18.0
    • None

    Description

      As stated in the docs, rewriting an inner join and filtering nulls from inputs can be a big performance gain: http://pig.apache.org/docs/r0.14.0/perf.html#nulls

      We would like to add an optimizer rule which detects inner joins, and filters nulls in all inputs:
      A = filter A by t is not null;
      B = filter B by x is not null;
      C = join A by t, B by x;

      see also: http://stackoverflow.com/questions/32088389/is-the-pig-optimizer-filtering-nulls-before-joining

      Attachments

        1. PIG-4662-1.patch
          17 kB
          Satish Saley

        Issue Links

          Activity

            People

              satishsaley Satish Saley
              ihadanny Ido Hadanny
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: