Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1337

Need a way to pass distributed cache configuration information to hadoop backend in Pig's LoadFunc

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.6.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The Zebra storage layer needs to use distributed cache to reduce name node load during job runs.

      To to this, Zebra needs to set up distributed cache related configuration information in TableLoader (which extends Pig's LoadFunc) .
      It is doing this within getSchema(conf). The problem is that the conf object here is not the one that is being serialized to map/reduce backend. As such, the distributed cache is not set up properly.

      To work over this problem, we need Pig in its LoadFunc to ensure a way that we can use to set up distributed cache information in a conf object, and this conf object is the one used by map/reduce backend.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                chaow Chao Wang
              • Votes:
                2 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: