Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
0.8.1
-
None
-
None
-
None
-
pig-0.8.1, hadoop-0.20.2 from Clouderas distribution cdh3u3 on Kubuntu 12.04 64Bit.
Description
The following pig script does not produce the expected output:
register adition.jar a = LOAD 'TestCONTAINS-testFilteringCluster-input.txt' AS (id:int, grp:int, additional:int, referer:chararray); b = FILTER a BY com.adition.pig.filtering.string.CONTAINS(referer, 'obama') OR com.adition.pig.filtering.string.CONTAINS(referer, 'praesident'); EXPLAIN b; dump b;
TestCONTAINS-testFilteringCluster-input.txt contains
1 23 42 http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=flowers 2 123 42 http://www.google.com/url&url=http%3A%2F%2Fwww.zeit.de%2Findex.php&q=towers 3 223 142 http://www.google.com/url&url=http%3A%2F%2Fwww.nix-wie-weg.de&q=mallorca 4 323 242 http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama 5 423 342 http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama 6 523 442 http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=praesident
The adition.jar has been built against the cloudera cdh3u3 distribution
and contains the filter function CONTAINS, see here http://pastebin.com/Uwje7v1V .
The output can be seen here http://pastebin.com/yXY17mXx . Essentially what is happening is that the right hand side of the OR in the FILTER expression is beeing ignored, resulting in the script returning just two lines
(4,323,242,http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama) (5,423,342,http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama)
instead of three lines
(4,323,242,http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama) (5,423,342,http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama) (6,523,442,http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=praesident)
Running the script with pig 0.11.0 yields correct results http://pastebin.com/Cr5CkHui
See also the diskussion on the pig-user mailinglist
http://www.mail-archive.com/user%40pig.apache.org/msg05278.html