Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4691

[Pig on Tez] Support for whitelisting storefuncs for union optimization

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.16.0
    • None
    • None
    • Reviewed
    • Hide
      Union optimization (pig.tez.opt.union=true) in tez uses vertex groups to store output from different vertices into one final output location. If a StoreFunc's OutputCommitter does not honor mapreduce.output.basename or has other issues with multiple vertices writing to the destination location at the same time, then you can disable union optimization just for that StoreFunc. Refer PIG-4649. You can also specify a whitelist of StoreFuncs that are known to work with multiple vertices writing to same location instead of a blacklist.

      #pig.tez.opt.union.unsupported.storefuncs=org.apache.hcatalog.pig.HCatStorer,org.apache.hive.hcatalog.pig.HCatStorer
      #pig.tez.opt.union.supported.storefuncs=
      Show
      Union optimization (pig.tez.opt.union=true) in tez uses vertex groups to store output from different vertices into one final output location. If a StoreFunc's OutputCommitter does not honor mapreduce.output.basename or has other issues with multiple vertices writing to the destination location at the same time, then you can disable union optimization just for that StoreFunc. Refer PIG-4649 . You can also specify a whitelist of StoreFuncs that are known to work with multiple vertices writing to same location instead of a blacklist. #pig.tez.opt.union.unsupported.storefuncs=org.apache.hcatalog.pig.HCatStorer,org.apache.hive.hcatalog.pig.HCatStorer #pig.tez.opt.union.supported.storefuncs=

    Description

      PIG-4649 added support for blacklisting some storefuncs when applying union+store vertex group optimization as HCatStorer was not honoring mapreduce.output.basename and hardcoding part file names. Found that some of our user StoreFuncs also do that and ended up with partial results. So would be good to have a whitelist option as well where you can put StoreFuncs that do not mess with mapreduce.output.basename.

      Attachments

        1. PIG-4691-1.patch
          15 kB
          Rohini Palaniswamy

        Issue Links

          Activity

            People

              rohini Rohini Palaniswamy
              rohini Rohini Palaniswamy
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: