Pig
  1. Pig
  2. PIG-2658

Add start time for pig script in generated Map-Reduce job conf

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Set pig.script.submitted.timestamp and pig.job.submitted.timestamp on job conf when script/jobs are submitted.

      Description

      The overall identifier for a pig script pig.script.id added in PIG-1280 ties together the various map-reduce jobs generated for a script.
      It would be good to have a timestamp added in the MR job config as well, in order to be able to group one run of a pig script together.

      This field in the job conf for each of the generated Hadoop MR jobs would then be the same for one run and represent when the pig script was started.

      1. PIG-2658-1.patch
        1 kB
        Bill Graham
      2. PIG-2658-2.patch
        1 kB
        Bill Graham
      3. PIG-2658-3.patch
        2 kB
        Bill Graham
      4. PIG-2658-4.patch
        2 kB
        Bill Graham

        Activity

        Hide
        Bill Graham added a comment -

        I can handle this one. The identifier that sticks with a pig script over multiple invocations has been implemented in PIG-2587 FYI.

        Show
        Bill Graham added a comment - I can handle this one. The identifier that sticks with a pig script over multiple invocations has been implemented in PIG-2587 FYI.
        Hide
        Bill Graham added a comment -

        Here's a first pass. I'm setting pig.job.submitted.timestamp to the current timestamp just before jobs get submitted.

        Show
        Bill Graham added a comment - Here's a first pass. I'm setting pig.job.submitted.timestamp to the current timestamp just before jobs get submitted.
        Hide
        Bill Graham added a comment -

        Patch #2 with a more generic param name:

        job.submitted.timestamp

        Show
        Bill Graham added a comment - Patch #2 with a more generic param name: job.submitted.timestamp
        Hide
        Bill Graham added a comment -

        My previous patch includes the time the jobs were submitted, which isn't what was requested. This one includes two params:

        script.submitted.timestamp
        job.submitted.timestamp

        The first will be the same for all jobs.

        Show
        Bill Graham added a comment - My previous patch includes the time the jobs were submitted, which isn't what was requested. This one includes two params: script.submitted.timestamp job.submitted.timestamp The first will be the same for all jobs.
        Hide
        Julien Le Dem added a comment -

        should these start with pig. ?

        Show
        Julien Le Dem added a comment - should these start with pig. ?
        Hide
        Dmitriy V. Ryaboy added a comment -

        Julien – why?

        These seem like generic enough concepts, if we can get others to adopt them, having generic names would make life easier.

        Show
        Dmitriy V. Ryaboy added a comment - Julien – why? These seem like generic enough concepts, if we can get others to adopt them, having generic names would make life easier.
        Hide
        Julien Le Dem added a comment -

        I do agree that those are generic, ideally we define those names in Hadoop to favor reuse.

        Show
        Julien Le Dem added a comment - I do agree that those are generic, ideally we define those names in Hadoop to favor reuse.
        Hide
        Dmitriy V. Ryaboy added a comment -

        That would be better. It's a long way to global adoption of 0.23 though, and we'd like to have folks on 20 to use this too.. ok, let's prepend with "pig."

        D

        Show
        Dmitriy V. Ryaboy added a comment - That would be better. It's a long way to global adoption of 0.23 though, and we'd like to have folks on 20 to use this too.. ok, let's prepend with "pig." D
        Hide
        Bill Graham added a comment -

        OK, adding patch 4, which changes param names back to be pig-prefixed:

        pig.script.submitted.timestamp
        pig.job.submitted.timestamp
        
        Show
        Bill Graham added a comment - OK, adding patch 4, which changes param names back to be pig-prefixed: pig.script.submitted.timestamp pig.job.submitted.timestamp
        Hide
        Julien Le Dem added a comment -

        +1

        Show
        Julien Le Dem added a comment - +1
        Hide
        Bill Graham added a comment -

        Thanks, committed.

        Show
        Bill Graham added a comment - Thanks, committed.

          People

          • Assignee:
            Bill Graham
            Reporter:
            Joep Rottinghuis
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development