Pig
  1. Pig
  2. PIG-1702

Streaming debug output outputs null input-split information

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.10.0
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Within the Pig streaming command execution, debug information is printed out to stderr which specified the input file, as well as split information. The function is org.apache.pig.backend.hadoop.streaming.HadoopExecutableManager.writeDebugHeader(). Pig 0.7 outputs null for the split file, and -1 for the split start-offset and split length. Example output:

      ===== Task Information Header =====
      Command: test.pl (stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)
      Start time: Mon Oct 25 21:24:45 EDT 2010
      Input-split file: null
      Input-split start-offset: -1
      Input-split length: -1

      Within the writeDebugHeader() function, the input file information is obtained by querying for the "map.input.file" configuration variable. This configuration variable was set by the old hadoop m/r api, but not by the 0.20 api, which Pig 0.7 now uses. The new way to get this information is with something like: ((FileSplit) context.getInputSplit).getPath(). See HADOOP-5973.

      1. PIG-1702-0.patch
        2 kB
        Adam Warrington

        Activity

        Adam Warrington created issue -
        Hide
        Ashutosh Chauhan added a comment -

        @Adam,

        Nice catch. Would you like to contribute a patch for it?

        Show
        Ashutosh Chauhan added a comment - @Adam, Nice catch. Would you like to contribute a patch for it?
        Hide
        Adam Warrington added a comment -

        Yea, I'd like to do that. I'll try to get one up very soon.

        Show
        Adam Warrington added a comment - Yea, I'd like to do that. I'll try to get one up very soon.
        Hide
        Alex Kozlov added a comment -

        Any update on this?

        Show
        Alex Kozlov added a comment - Any update on this?
        Hide
        Adam Warrington added a comment -

        Here is a patch that fixes the Header output by retrieving the information (the path, start offset, and length) from the FileSplit.

        One potential issue with this code is that it has to gain a reference to the current MapContext, which it does from PigMapReduce.sJobContext, and if PIG is running in local mode, there may be a race condition. PIG-1831 solved a similar issue with the configuration. Would it be wise to use a thread local variable in PigMapReduce for the context as well?

        Show
        Adam Warrington added a comment - Here is a patch that fixes the Header output by retrieving the information (the path, start offset, and length) from the FileSplit. One potential issue with this code is that it has to gain a reference to the current MapContext, which it does from PigMapReduce.sJobContext, and if PIG is running in local mode, there may be a race condition. PIG-1831 solved a similar issue with the configuration. Would it be wise to use a thread local variable in PigMapReduce for the context as well?
        Adam Warrington made changes -
        Field Original Value New Value
        Attachment PIG-1702-0.patch [ 12475395 ]
        Adam Warrington made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Fix Version/s 0.9.0 [ 12315191 ]
        Hide
        Adam Warrington added a comment -

        Review request can be found here:

        https://reviews.apache.org/r/547/

        Show
        Adam Warrington added a comment - Review request can be found here: https://reviews.apache.org/r/547/
        Olga Natkovich made changes -
        Assignee Adam Warrington [ awarring ]
        Hide
        Olga Natkovich added a comment -

        Delaying till 10 since we are about to spin the release.

        Can one of the committers review post 0.9? thanks

        Show
        Olga Natkovich added a comment - Delaying till 10 since we are about to spin the release. Can one of the committers review post 0.9? thanks
        Olga Natkovich made changes -
        Fix Version/s 0.10 [ 12316246 ]
        Olga Natkovich made changes -
        Fix Version/s 0.9.0 [ 12315191 ]
        Hide
        Daniel Dai added a comment -

        Patch looks fine. Will commit it shortly.

        Show
        Daniel Dai added a comment - Patch looks fine. Will commit it shortly.
        Hide
        Daniel Dai added a comment -

        Patch committed to trunk. Thanks Adam!

        Show
        Daniel Dai added a comment - Patch committed to trunk. Thanks Adam!
        Daniel Dai made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Daniel Dai made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Adam Warrington
            Reporter:
            Adam Warrington
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development