Hadoop Common
  1. Hadoop Common
  2. HADOOP-3098

dfs -chown does not like "_" underscore in user name

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.16.0, 0.16.1
    • Fix Version/s: 0.16.2
    • Component/s: fs
    • Labels:
      None
    • Release Note:
      chown and chgrp commands allow more flexible names compared to 0.16. See 'fs -help chown'. Inform Users : YES.

      Description

      :~$ hadoop dfs -chown aaa_bbb /user/knoguchi/test.txt
      chown: 'aaa_bbb' does not match expected pattern for [owner][:group].

      in 0.16.1, only alphabets and numbers are allowed. Shouldn't '_' be allowed?

      I couldn't find any standard, but in Solaris10, it's defined as

      http://docs.sun.com/app/docs/doc/816-5174/6mbb98uhg?a=view

      The login (login) and role (role) fields accept a string of no more than eight bytes consisting of characters from the set of alphabetic characters, numeric characters, period (.), underscore (_), and hyphen . The first character should be alphabetic and the field should contain at least one lower case alphabetic character. A warning message is displayed if these restrictions are not met.

      1. HADOOP-3098.patch
        3 kB
        Raghu Angadi
      2. HADOOP-3098.patch
        4 kB
        Raghu Angadi

        Issue Links

          Activity

          Hide
          Raghu Angadi added a comment - - edited

          Sure. There was no particular reason to restrict it to alphanumberic only. Of course we don't need to limit it to 8 characters.

          One problem is period(.) : what does 'chown raghu.user file' mean, is 'user' group name or part of the user name? On linux it will set the file onwer to raghu and group owner to user. Also these os' have advantage of being able to validate if some name a known username or not (from /etc/passwd). All we do now is to match a regex.

          What about spaces?

          I can propose a regex here.

          Show
          Raghu Angadi added a comment - - edited Sure. There was no particular reason to restrict it to alphanumberic only. Of course we don't need to limit it to 8 characters. One problem is period(.) : what does 'chown raghu.user file' mean, is 'user' group name or part of the user name? On linux it will set the file onwer to raghu and group owner to user. Also these os' have advantage of being able to validate if some name a known username or not (from /etc/passwd). All we do now is to match a regex. What about spaces? I can propose a regex here.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I suggest we don't restrict user and group names at all but provide a way to escape the special characters in the commands, for example "raghu\.user" .

          Show
          Tsz Wo Nicholas Sze added a comment - I suggest we don't restrict user and group names at all but provide a way to escape the special characters in the commands, for example "raghu\.user" .
          Hide
          Raghu Angadi added a comment -

          Yes.

          FWIW : 'fs -help chown' currently says :

                          [...] The owner and group names can only
                          contain digits and alphabet. These names are case sensitive.
          
          Show
          Raghu Angadi added a comment - Yes. FWIW : 'fs -help chown' currently says : [...] The owner and group names can only contain digits and alphabet. These names are case sensitive.
          Hide
          Owen O'Malley added a comment -

          -1 on this for 16.2, since it doesn't seem to be a bug.

          I think for 17 we should decide what the valid characters in group and user names are. Given kerberos, I think it should likely consist of characters from:

          [a-zA-Z0-9_@./]

          Thoughts?

          Show
          Owen O'Malley added a comment - -1 on this for 16.2, since it doesn't seem to be a bug. I think for 17 we should decide what the valid characters in group and user names are. Given kerberos, I think it should likely consist of characters from: [a-zA-Z0-9_@./] Thoughts?
          Hide
          Robert Chansler added a comment -

          Don, Lenny, Nigel, Koji and Rob: Exactly dots, hyphens and underscores for 16.2 and 17.

          And a unit test!

          And tweak the documentation.

          Show
          Robert Chansler added a comment - Don, Lenny, Nigel, Koji and Rob: Exactly dots, hyphens and underscores for 16.2 and 17. And a unit test! And tweak the documentation.
          Hide
          Robert Chansler added a comment -

          Don, Lenny, Nigel, Koji and Rob: Exactly dots, hyphens and underscores for 16.2 and 17.

          And a unit test!

          And tweak the documentation.

          Show
          Robert Chansler added a comment - Don, Lenny, Nigel, Koji and Rob: Exactly dots, hyphens and underscores for 16.2 and 17. And a unit test! And tweak the documentation.
          Hide
          Raghu Angadi added a comment -

          Here is the proposal:

          Period (.) no longer separates user name and group (so raghu.user just specifies a user name). Linux allows it but it Solaris and others does not seem to. We will explicitly mention it in help message. We could escape it as Nicholas suggested but it might be confusing to the user any way...

          Owen, should we include @ and '/' now? We could always add these later.

          With these patch is going to be pretty simple.

          Show
          Raghu Angadi added a comment - Here is the proposal: Period (.) no longer separates user name and group (so raghu.user just specifies a user name). Linux allows it but it Solaris and others does not seem to. We will explicitly mention it in help message. We could escape it as Nicholas suggested but it might be confusing to the user any way... Owen, should we include @ and '/' now? We could always add these later. With these patch is going to be pretty simple.
          Hide
          Raghu Angadi added a comment -

          Patch attached. The name can consists of [-_.a-zA-Z0-9].

          Normal chown does not talk about any escaping. That's what I meant when I said it might confuse the users.

          Show
          Raghu Angadi added a comment - Patch attached. The name can consists of [-_.a-zA-Z0-9] . Normal chown does not talk about any escaping. That's what I meant when I said it might confuse the users.
          Hide
          Raghu Angadi added a comment -

          btw, how does this affect international users?

          Show
          Raghu Angadi added a comment - btw, how does this affect international users?
          Hide
          Allen Wittenauer added a comment -

          If we're going to fix this, why don't we just go ahead and do / and @ for Kerberos now rather than require another patch later?

          As to i18n, AFAIK, POSIX doesn't dictate what characters are allowed in the pw_name field of the passwd struct. Depending upon the age of the source, it may or may not be limited to 8 chars.

          A brief search that no one other than Sun in recent releases of Solaris seem to document just what the allowed characters are on the passwd username field, despite the fact that pwck is in quite a few OSes, mainly SysV variants. The vast majority list just restrictions. (first char needs to be a letter, don't use : are the two most common).

          This likely means that the above range will break for i18n. It may be worthwhile to peruse some login code (login itself, ssh, rsh, telnet, etc,) to see how they handle it.

          That said, the above range should be 'good enough' to cover a very large percentage of the cases.

          Show
          Allen Wittenauer added a comment - If we're going to fix this, why don't we just go ahead and do / and @ for Kerberos now rather than require another patch later? As to i18n, AFAIK, POSIX doesn't dictate what characters are allowed in the pw_name field of the passwd struct. Depending upon the age of the source, it may or may not be limited to 8 chars. A brief search that no one other than Sun in recent releases of Solaris seem to document just what the allowed characters are on the passwd username field, despite the fact that pwck is in quite a few OSes, mainly SysV variants. The vast majority list just restrictions. (first char needs to be a letter, don't use : are the two most common). This likely means that the above range will break for i18n. It may be worthwhile to peruse some login code (login itself, ssh, rsh, telnet, etc,) to see how they handle it. That said, the above range should be 'good enough' to cover a very large percentage of the cases.
          Hide
          Raghu Angadi added a comment -

          > If we're going to fix this, why don't we just go ahead and do / and @ for Kerberos now rather than require another patch later?

          true. I was not sure if these may be required or will be required.

          Do we really want dot(.)? Unlike HDFS, we don't use system call to for LocalFS. So it still goes through shell command.. so '.' will still be treated as separation between username and group.. just adds to user confusion. If we do include dot, I need to update the documentation a little bit more.

          Show
          Raghu Angadi added a comment - > If we're going to fix this, why don't we just go ahead and do / and @ for Kerberos now rather than require another patch later? true. I was not sure if these may be required or will be required. Do we really want dot(.)? Unlike HDFS, we don't use system call to for LocalFS. So it still goes through shell command.. so '.' will still be treated as separation between username and group.. just adds to user confusion. If we do include dot, I need to update the documentation a little bit more.
          Hide
          Allen Wittenauer added a comment -

          I'm not sure why period (.) would be an issue. chown and chgrp use colons( for user and group separation. Periods aren't recommended for usage due to how mail servers may interpret them, but I don't know of any standard UNIX commands that choke on it.

          Show
          Allen Wittenauer added a comment - I'm not sure why period (.) would be an issue. chown and chgrp use colons( for user and group separation. Periods aren't recommended for usage due to how mail servers may interpret them, but I don't know of any standard UNIX commands that choke on it.
          Hide
          Raghu Angadi added a comment -

          > chown and chgrp use colons( for user and group separation.
          Linux (Fedora, and may be others) allows '.' also to separate username and group. I agree it would be bad practice for a user to use '.' for separation.

          Show
          Raghu Angadi added a comment - > chown and chgrp use colons( for user and group separation. Linux (Fedora, and may be others) allows '.' also to separate username and group. I agree it would be bad practice for a user to use '.' for separation.
          Hide
          Raghu Angadi added a comment -

          btw, the 'chown' command does not choke on it. It might just surprise the user that uses '.' for user names.

          Show
          Raghu Angadi added a comment - btw, the 'chown' command does not choke on it. It might just surprise the user that uses '.' for user names.
          Hide
          Allen Wittenauer added a comment -

          That must be a GNU-ism. As they say, GNU is Not UNIX.

          (and looking at the man page, I don't see support for . being a separator documented)

          Show
          Allen Wittenauer added a comment - That must be a GNU-ism. As they say, GNU is Not UNIX. (and looking at the man page, I don't see support for . being a separator documented)
          Hide
          Allen Wittenauer added a comment -

          Ahh...

          http://www.opengroup.org/onlinepubs/009695399/utilities/chown.html

          ... which also happens to have the answer to the question of what are valid user name characters.

          Show
          Allen Wittenauer added a comment - Ahh... http://www.opengroup.org/onlinepubs/009695399/utilities/chown.html ... which also happens to have the answer to the question of what are valid user name characters.
          Hide
          Raghu Angadi added a comment -

          Updated the patch to allow '@' and '/'. Documentation includes an explicit warning for Linux users (may be 90+% of hadoop users?) :

          $ bin/hadoop fs -Dfs.default.name=file:/// -help chown
          -chown [-R] [OWNER][:[GROUP]] PATH...
                          Changes owner and group of a file.
                          This is similar to shell's chown with a few exceptions.
          
                  -R      modifies the files recursively. This is the only option
                          currently supported.
          
                          If only owner or group is specified then only owner or
                          group is modified.
          
                          The owner and group names may only cosists of digits, alphabet,
                          and any of '-_.@/' i.e. [-_.@/a-zA-Z0-9]. The names are case
                          sensitive.
          
                          WARNING: Avoid using '.' to separate user name and group though
                          Linux allows it. If user names have dots in them and you are
                          using local file system, you might see surprising results since
                          shell command 'chown' is used for local files.
          
          Show
          Raghu Angadi added a comment - Updated the patch to allow '@' and '/'. Documentation includes an explicit warning for Linux users (may be 90+% of hadoop users?) : $ bin/hadoop fs -Dfs.default.name=file:/// -help chown -chown [-R] [OWNER][:[GROUP]] PATH... Changes owner and group of a file. This is similar to shell's chown with a few exceptions. -R modifies the files recursively. This is the only option currently supported. If only owner or group is specified then only owner or group is modified. The owner and group names may only cosists of digits, alphabet, and any of '-_.@/' i.e. [-_.@/a-zA-Z0-9]. The names are case sensitive. WARNING: Avoid using '.' to separate user name and group though Linux allows it. If user names have dots in them and you are using local file system, you might see surprising results since shell command 'chown' is used for local files.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 codes look good

          minor comment: it would be better to use allowedChars to construct the help message.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 codes look good minor comment: it would be better to use allowedChars to construct the help message.
          Hide
          Raghu Angadi added a comment -

          Thanks Nicholas. Regd using 'alloedChars', currently long help in a different file and even with that, it needs to be interpreted by the person that changes the documentation (like now). Its ok for now I think.

          Show
          Raghu Angadi added a comment - Thanks Nicholas. Regd using 'alloedChars', currently long help in a different file and even with that, it needs to be interpreted by the person that changes the documentation (like now). Its ok for now I think.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12378739/HADOOP-3098.patch
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 5 new or modified tests.

          javadoc -1. The javadoc tool appears to have generated 1 warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2083/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2083/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2083/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2083/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378739/HADOOP-3098.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 5 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2083/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2083/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2083/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2083/console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12378739/HADOOP-3098.patch
          against trunk revision 619744.

          @author +1. The patch does not contain any @author tags.

          tests included +1. The patch appears to include 5 new or modified tests.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2086/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2086/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2086/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2086/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12378739/HADOOP-3098.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 5 new or modified tests. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2086/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2086/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2086/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2086/console This message is automatically generated.
          Hide
          Raghu Angadi added a comment -

          I just committed this.

          Show
          Raghu Angadi added a comment - I just committed this.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #445 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/445/ )
          Hide
          Rob Weltman added a comment -

          For user-facing UI like the "help" output from a command, can we spend a few minutes making the text grammatically correct before checking code in?

          "The owner and group names may only cosists of digits, alphabet"
          ->
          "may only consist of digits, alphabetical characters,"

          For future reference, anyway.

          Show
          Rob Weltman added a comment - For user-facing UI like the "help" output from a command, can we spend a few minutes making the text grammatically correct before checking code in? "The owner and group names may only cosists of digits, alphabet" -> "may only consist of digits, alphabetical characters," For future reference, anyway.
          Hide
          Raghu Angadi added a comment - - edited

          > an we spend a few minutes making the text grammatically correct before checking code in?

          I wrote it and spent quite a bit more than a few munites... and I wanted to others to read as well. Thats why I included the whole text in a comment above.

          Show
          Raghu Angadi added a comment - - edited > an we spend a few minutes making the text grammatically correct before checking code in? I wrote it and spent quite a bit more than a few munites... and I wanted to others to read as well. Thats why I included the whole text in a comment above.
          Hide
          Raghu Angadi added a comment -

          Rob, I am assuming the one you corrected is the only correction required. Once you confirm, I will commit the change.

          Show
          Raghu Angadi added a comment - Rob, I am assuming the one you corrected is the only correction required. Once you confirm, I will commit the change.
          Hide
          Marc Villacorta added a comment -

          In my company we use 'COMPANYNAME+name.surname' as usernames.
          The '+' character is the winbind separator.
          I am able to create files in HDFS but I cannot chown as '+' is not in [-_.@/a-zA-Z0-9].
          We are also planning to enable kerberos in our Hadoop cluster.

          Why '+' is not allowed?
          Thank you.

          Show
          Marc Villacorta added a comment - In my company we use 'COMPANYNAME+name.surname' as usernames. The '+' character is the winbind separator. I am able to create files in HDFS but I cannot chown as '+' is not in [-_.@/a-zA-Z0-9] . We are also planning to enable kerberos in our Hadoop cluster. Why '+' is not allowed? Thank you.

            People

            • Assignee:
              Raghu Angadi
              Reporter:
              Koji Noguchi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development