Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-3927

Pig HCatLoader does not set 'hive.io.file.readcolumn.ids' properly during partition pushdown

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Example script:

      A = load 'db.tableA' using org.apache.hive.hcatalog.pig.HCatLoader();
      
      Aproj = foreach A generate browser, bckt, type, ip, yuid;
      
      B = load 'db.tableB' using
      org.apache.hive.hcatalog.pig.HCatLoader();
      
      Bproj = foreach B generate browser, name, age;
      
      C = join Aproj by browser, Bproj by browser;
      
      D = foreach C generate Bproj::browser, bckt, ip, name, age;
      
      store D into '/user/bob/testjoin2table' using PigStorage();
      

      When HCatLoader loads more than one table and sets the column ids to prune, it is setting in job conf, the required ids of the latest table loaded and applying it to other table(s) too, giving wrong results for joins etc.

      Attachments

        Activity

          People

            Unassigned Unassigned
            chitnis Mona Chitnis
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: