Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Example script:
A = load 'db.tableA' using org.apache.hive.hcatalog.pig.HCatLoader(); Aproj = foreach A generate browser, bckt, type, ip, yuid; B = load 'db.tableB' using org.apache.hive.hcatalog.pig.HCatLoader(); Bproj = foreach B generate browser, name, age; C = join Aproj by browser, Bproj by browser; D = foreach C generate Bproj::browser, bckt, ip, name, age; store D into '/user/bob/testjoin2table' using PigStorage();
When HCatLoader loads more than one table and sets the column ids to prune, it is setting in job conf, the required ids of the latest table loaded and applying it to other table(s) too, giving wrong results for joins etc.