Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2722

UDF FilterFunc in expression using OR right hand side gets ignored

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.8.1
    • None
    • None
    • None
    • pig-0.8.1, hadoop-0.20.2 from Clouderas distribution cdh3u3 on Kubuntu 12.04 64Bit.

    Description

      The following pig script does not produce the expected output:

      register adition.jar
      
      a = LOAD 'TestCONTAINS-testFilteringCluster-input.txt' AS (id:int, grp:int, additional:int, referer:chararray);
      b = FILTER a BY com.adition.pig.filtering.string.CONTAINS(referer, 'obama') OR com.adition.pig.filtering.string.CONTAINS(referer, 'praesident');
      
      EXPLAIN b;
      
      dump b;
      

      TestCONTAINS-testFilteringCluster-input.txt contains

      1  23 42 http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=flowers
      2  123   42 http://www.google.com/url&url=http%3A%2F%2Fwww.zeit.de%2Findex.php&q=towers
      3  223   142   http://www.google.com/url&url=http%3A%2F%2Fwww.nix-wie-weg.de&q=mallorca
      4  323   242   http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama
      5  423   342   http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama
      6  523   442   http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=praesident
      

      The adition.jar has been built against the cloudera cdh3u3 distribution
      and contains the filter function CONTAINS, see here http://pastebin.com/Uwje7v1V .

      The output can be seen here http://pastebin.com/yXY17mXx . Essentially what is happening is that the right hand side of the OR in the FILTER expression is beeing ignored, resulting in the script returning just two lines

      (4,323,242,http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama)
      (5,423,342,http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama)
      

      instead of three lines

      (4,323,242,http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama)
      (5,423,342,http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama)
      (6,523,442,http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=praesident)
      

      Running the script with pig 0.11.0 yields correct results http://pastebin.com/Cr5CkHui

      See also the diskussion on the pig-user mailinglist
      http://www.mail-archive.com/user%40pig.apache.org/msg05278.html

      Attachments

        Activity

          People

            Unassigned Unassigned
            johannesch Johannes Schwenk
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: