Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1174

Creation of output path should be done by storage function

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • 0.7.0
    • None
    • None

    Description

      When executing a STORE command, Pig creates the output location before the storage function gets called. This causes problems with storage functions that have logic to determine the output location. See this thread:

      http://www.mail-archive.com/pig-user%40hadoop.apache.org/msg01538.html

      For example, when making a request like this:

      STORE A INTO '/my/home/output' USING MultiStorage('/my/home/output','0', 'none', '\t');

      Pig creates a file '/my/home/output' and then an exception is thrown when MultiStorage tries to make a directory under '/my/home/output'. The workaround is to instead specify a dummy location as the first path like so:

      STORE A INTO '/my/home/output/temp' USING MultiStorage('/my/home/output','0', 'none', '\t');

      Two changes should be made:
      1. The path specified in the INTO clause should be available to the storage function so it doesn't need to be duplicated.
      2. The creation of the output paths should be delegated to the storage function.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              billgraham William W. Graham Jr
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: