Pig
  1. Pig
  2. PIG-2900

Streaming should provide conf settings in the environment

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Hide
      The STREAM operator now makes all jobconf properties available to the programs processing streaming input via environment variables, consistend with Hadoop Streaming behavior.
      All "." characters in the jobconf properties are replaced with underscores, "_".
      Show
      The STREAM operator now makes all jobconf properties available to the programs processing streaming input via environment variables, consistend with Hadoop Streaming behavior. All "." characters in the jobconf properties are replaced with underscores, "_".

      Description

      Hadoop Streaming converts jobconf properties into environment variables; Pig streaming does not. This is a useful feature that Pig streaming should provide.

      1. PIG-2900.1.patch
        16 kB
        Dmitriy V. Ryaboy
      2. PIG-2900.patch
        14 kB
        Dmitriy V. Ryaboy

        Issue Links

          Activity

          Dmitriy V. Ryaboy created issue -
          Hide
          Dmitriy V. Ryaboy added a comment -

          No tests, but all the code is ripped out straight from Hadoop Streaming. Tested on the cluster.

          Will add tests.

          Show
          Dmitriy V. Ryaboy added a comment - No tests, but all the code is ripped out straight from Hadoop Streaming. Tested on the cluster. Will add tests.
          Dmitriy V. Ryaboy made changes -
          Field Original Value New Value
          Attachment PIG-2900.patch [ 12543372 ]
          Hide
          Dmitriy V. Ryaboy added a comment -

          Now with tests. Ready for review.

          Show
          Dmitriy V. Ryaboy added a comment - Now with tests. Ready for review.
          Dmitriy V. Ryaboy made changes -
          Attachment PIG-2900.1.patch [ 12543378 ]
          Dmitriy V. Ryaboy made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Dmitriy V. Ryaboy added a comment -

          Bump for review.

          Show
          Dmitriy V. Ryaboy added a comment - Bump for review.
          Hide
          Alan Gates added a comment -

          In general it looks good. I had a couple of questions/comments:

          We should add a note to the release notes section of the JIRA noting the new
          feature and how the mapping of env var names will be handled (e.g. a.b.c will
          be mapped to a_b_c).

          It would be nice to have an e2e test that checks that the environment variable ends up on the remote side. I'll take a look at adding that.

          The unit test you provided fails on my mac. It seems dfs_data_dir isn't in the created configuration. A lot of other values are, like hadoop_tmp_dir. I didn't run it on Linux to see if it works ok there.

          Show
          Alan Gates added a comment - In general it looks good. I had a couple of questions/comments: We should add a note to the release notes section of the JIRA noting the new feature and how the mapping of env var names will be handled (e.g. a.b.c will be mapped to a_b_c). It would be nice to have an e2e test that checks that the environment variable ends up on the remote side. I'll take a look at adding that. The unit test you provided fails on my mac. It seems dfs_data_dir isn't in the created configuration. A lot of other values are, like hadoop_tmp_dir. I didn't run it on Linux to see if it works ok there.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Alan,
          I'll add the release notes.
          That's interesting about dfs_data_dir .. are you using hadoop 23? Either way, I guess some other value should be used; I didn't know dfs.data.dir can be absent. Do you think we can rely on hadoop.tmp.dir existing in the default conf?

          Show
          Dmitriy V. Ryaboy added a comment - Alan, I'll add the release notes. That's interesting about dfs_data_dir .. are you using hadoop 23? Either way, I guess some other value should be used; I didn't know dfs.data.dir can be absent. Do you think we can rely on hadoop.tmp.dir existing in the default conf?
          Hide
          Alan Gates added a comment -

          I'm just building Pig with default options on my mac. I didn't know it could be missing either. hadoop.tmp.dir seems to be shared across platforms at the moment.

          I'm +1 for this patch.

          Show
          Alan Gates added a comment - I'm just building Pig with default options on my mac. I didn't know it could be missing either. hadoop.tmp.dir seems to be shared across platforms at the moment. I'm +1 for this patch.
          Hide
          Dmitriy V. Ryaboy added a comment -

          Committed to trunk.
          Thanks for the review, Alan!

          Show
          Dmitriy V. Ryaboy added a comment - Committed to trunk. Thanks for the review, Alan!
          Dmitriy V. Ryaboy made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.11 [ 12318878 ]
          Resolution Fixed [ 1 ]
          Dmitriy V. Ryaboy made changes -
          Release Note The STREAM operator now makes all jobconf properties available to the programs processing streaming input via environment variables, consistend with Hadoop Streaming behavior.
          All "." characters in the jobconf properties are replaced with underscores, "_".
          Cheolsoo Park made changes -
          Link This issue relates to PIG-3001 [ PIG-3001 ]
          Bill Graham made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          2h 23m 1 Dmitriy V. Ryaboy 01/Sep/12 01:21
          Patch Available Patch Available Resolved Resolved
          12d 21h 5m 1 Dmitriy V. Ryaboy 13/Sep/12 22:27
          Resolved Resolved Closed Closed
          161d 7h 26m 1 Bill Graham 22/Feb/13 04:53

            People

            • Assignee:
              Dmitriy V. Ryaboy
              Reporter:
              Dmitriy V. Ryaboy
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development