Pig
  1. Pig
  2. PIG-3446 Umbrella jira for Pig on Tez
  3. PIG-3560

UniqueTez staging dir should be used for different users

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: tez-branch
    • Fix Version/s: tez-branch
    • Component/s: tez
    • Labels:
      None

      Description

      I discovered this bug while setting up a multi-user cluster. Since the staging dir for Pig jobs is always set to TEZ_AM_STAGING_DIR_DEFAULT (/tmp/tez), if multiple users submit Pig jobs, subsequent jobs fail with the following error:

      java.io.IOException: The ownership on the staging directory hdfs://10.170.21.33:9000/tmp/tez/staging is not as expected. It is owned by hadoop. The directory must be owned by the submitter cheolsoop or by cheolsoop
      
      1. PIG-3560-1.patch
        2 kB
        Cheolsoo Park
      2. PIG-3560-2.patch
        2 kB
        Cheolsoo Park

        Activity

        Hide
        Cheolsoo Park added a comment -

        Attaching a fix.

        Show
        Cheolsoo Park added a comment - Attaching a fix.
        Hide
        Cheolsoo Park added a comment -

        Actually, using a tmp path for staging dir breaks e2e tests. I changed it to <user_working_dir>/<application_id> in a new patch.

        Show
        Cheolsoo Park added a comment - Actually, using a tmp path for staging dir breaks e2e tests. I changed it to <user_working_dir>/<application_id> in a new patch.
        Hide
        Rohini Palaniswamy added a comment -

        Instead of us fixing like this, we should have tez provide a API similar to mapreduce (/basepath/<username>/.staging) and use that instead of checking TezConfiguration.TEZ_AM_STAGING_DIR ourselves. This is required for security as well to separate out users and have 700 on the <username> directory making it readable only by that user.

        MRApps.java
        public static Path getStagingAreaDir(Configuration conf, String user)

        { return new Path(conf.get(MRJobConfig.MR_AM_STAGING_DIR, MRJobConfig.DEFAULT_MR_AM_STAGING_DIR) + Path.SEPARATOR + user + Path.SEPARATOR + STAGING_CONSTANT); }

        We can keep this jira open to change to that API once they provide that.

        Show
        Rohini Palaniswamy added a comment - Instead of us fixing like this, we should have tez provide a API similar to mapreduce (/basepath/<username>/.staging) and use that instead of checking TezConfiguration.TEZ_AM_STAGING_DIR ourselves. This is required for security as well to separate out users and have 700 on the <username> directory making it readable only by that user. MRApps.java public static Path getStagingAreaDir(Configuration conf, String user) { return new Path(conf.get(MRJobConfig.MR_AM_STAGING_DIR, MRJobConfig.DEFAULT_MR_AM_STAGING_DIR) + Path.SEPARATOR + user + Path.SEPARATOR + STAGING_CONSTANT); } We can keep this jira open to change to that API once they provide that.
        Hide
        Cheolsoo Park added a comment -

        Sounds good. I can file a Tez jira for that API if you haven't done yet.

        Show
        Cheolsoo Park added a comment - Sounds good. I can file a Tez jira for that API if you haven't done yet.
        Hide
        Cheolsoo Park added a comment -

        Looks like Daniel Dai is fixing this as part of PIG-3539. Closing this.

        Show
        Cheolsoo Park added a comment - Looks like Daniel Dai is fixing this as part of PIG-3539 . Closing this.

          People

          • Assignee:
            Daniel Dai
            Reporter:
            Cheolsoo Park
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development