Bigtop
  1. Bigtop
  2. BIGTOP-456

Consider splitting homedir between mapred and hdfs users?

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.1.0
    • Fix Version/s: backlog
    • Component/s: General
    • Labels:
      None
    • Environment:

      RPMs

      Description

      Both "mapred" and "hdfs" users have the same home dir.

      A user reported having some problems with their config management system overwriting the "mapred" user permissions of the PID directory (Which is also its homedir) with those of the "hdfs" user (Same homedir as "mapred" user), which causes the tasktracker process to fail to start, since it now cannot write to the PID dir.

      Although the config system can be fixed not to do that, if both users had separate home dirs, this would not have been a problem, and the separation would have only been logical.

      I think after the username separation Hadoop has had in packaging terms, the homedir split does make sense.

      Its just 1/0.22 versions of Hadoop and their packages that could be affected by this.

      Presently, for 0.23+, I think we have /var/run/hadoop/ for all things HDFS (Should we rename?) and /var/run/yarn/ for all things MapReduce2 which makes sense and should be good enough.

        Activity

        Hide
        Jos Backus added a comment -

        Fwiw, to avoid this issue, at $work I'm going to run Hadoop daemons under a process manager such as daemontools. Pidfiles cause all kinds of issues such as these.

        Show
        Jos Backus added a comment - Fwiw, to avoid this issue, at $work I'm going to run Hadoop daemons under a process manager such as daemontools. Pidfiles cause all kinds of issues such as these.
        Hide
        Marcos Ortiz added a comment - - edited

        Yes, the /var/run/hadoop directory should be the right for it, although
        on the new versions of Fedora (15 or more), the default directory was
        changed to /run, so I don't know which is the best approach for it.

        Fedora 15 Releases Notes
        3. Changes for System Administrators
        3.2.2 /run directory [1]

        "Fedora 15 has a /run directory for storing runtime data. /run is
        now a tmpfs, and /var/run is bind mounted to it. /var/lock is
        as /var/run. Several programs including udev, dracut, mdadm,
        runtime data during early bootup before /var is mounted. However
        consensus between major distributions to shift to using /run instead.
        Fedora 15 is leading this change. Details including the benefits are
        explained here [2].

        This change is compliant with the Filesystem Hierarchy Standard [3]
        , which allows distributions to create new directories in the root hierarchy as
        long as there is careful consideration of the consequences. Co-author of
        the latest FHS specification has expressed support [4] for this change. Lennart Poettering
        has filed a request [5] to update the FHS standard to include this change as well."

        3.2.3 /var/run/ and /var/lock
        ensure to recreate their own files/dirs on startup, and cannot rely that
        doing this at package installation will suffice. It is possible to use
        systemd's |tmpfiles.d| mechanism to recreate directories and files
        beneath /var/run and /var/lock on boot, if necessary. See
        (http://0pointer.de/public/systemd-man/tmpfiles.d.html) and the conf
        files in /etc/tmpfiles.d for examples of such configuration. Fedora
        packaging guidelines for tmpfiles.d is at
        http://fedoraproject.org/wiki/Packaging:Tmpfiles.d.

        [1] http://docs.fedoraproject.org/en-US/Fedora/15/html/Release_Notes/sect-Release_Notes-Changes_for_SysAdmin.html
        [2] http://lists.fedoraproject.org/pipermail/devel/2011-March/150031.html
        [3] http://www.pathname.com/fhs/pub/fhs-2.3.html#THEROOTFILESYSTEM
        [4] https://lwn.net/Articles/436177/
        [5] http://bugs.freestandards.org/show_bug.cgi?id=718

        Show
        Marcos Ortiz added a comment - - edited Yes, the /var/run/hadoop directory should be the right for it, although on the new versions of Fedora (15 or more), the default directory was changed to /run, so I don't know which is the best approach for it. Fedora 15 Releases Notes 3. Changes for System Administrators 3.2.2 /run directory [1] "Fedora 15 has a /run directory for storing runtime data. /run is now a tmpfs, and /var/run is bind mounted to it. /var/lock is as /var/run. Several programs including udev, dracut, mdadm, runtime data during early bootup before /var is mounted. However consensus between major distributions to shift to using /run instead. Fedora 15 is leading this change. Details including the benefits are explained here [2] . This change is compliant with the Filesystem Hierarchy Standard [3] , which allows distributions to create new directories in the root hierarchy as long as there is careful consideration of the consequences. Co-author of the latest FHS specification has expressed support [4] for this change. Lennart Poettering has filed a request [5] to update the FHS standard to include this change as well." 3.2.3 /var/run/ and /var/lock ensure to recreate their own files/dirs on startup, and cannot rely that doing this at package installation will suffice. It is possible to use systemd's |tmpfiles.d| mechanism to recreate directories and files beneath /var/run and /var/lock on boot, if necessary. See ( http://0pointer.de/public/systemd-man/tmpfiles.d.html ) and the conf files in /etc/tmpfiles.d for examples of such configuration. Fedora packaging guidelines for tmpfiles.d is at http://fedoraproject.org/wiki/Packaging:Tmpfiles.d . [1] http://docs.fedoraproject.org/en-US/Fedora/15/html/Release_Notes/sect-Release_Notes-Changes_for_SysAdmin.html [2] http://lists.fedoraproject.org/pipermail/devel/2011-March/150031.html [3] http://www.pathname.com/fhs/pub/fhs-2.3.html#THEROOTFILESYSTEM [4] https://lwn.net/Articles/436177/ [5] http://bugs.freestandards.org/show_bug.cgi?id=718
        Hide
        Bruno Mahé added a comment -

        This is why /var/run/hadoop belongs to the group hadoop and is writable. So both daemons can write to it.
        By default it is set to:

        drwxrwxr-x   2 root      hadoop    4096 Mar 15 14:31 hadoop

        And I see the home dir of these users not being set to /var/run/hadoop but rather /usr/lib/hadoop:

        [root@172 ~]# grep hadoop /etc/passwd
        mapred:x:497:494:Hadoop MapReduce:/usr/lib/hadoop:/bin/bash
        hdfs:x:496:495:Hadoop HDFS:/usr/lib/hadoop:/bin/bash
        

        But I agree that separating the pid dir of these 2 users would make sense.

        Regarding /run vs /var/run, given we also strive to support older GNU/Linux distributions such as CentOS/RHEL 5 and others, I would rather stay on /var/run for consistency.
        Unless there is a compelling reason to have a specific packaging path for cases where /run is available. But that would make maintenance of the packages more complex, so it has to be worth it.

        Show
        Bruno Mahé added a comment - This is why /var/run/hadoop belongs to the group hadoop and is writable. So both daemons can write to it. By default it is set to: drwxrwxr-x 2 root hadoop 4096 Mar 15 14:31 hadoop And I see the home dir of these users not being set to /var/run/hadoop but rather /usr/lib/hadoop: [root@172 ~]# grep hadoop /etc/passwd mapred:x:497:494:Hadoop MapReduce:/usr/lib/hadoop:/bin/bash hdfs:x:496:495:Hadoop HDFS:/usr/lib/hadoop:/bin/bash But I agree that separating the pid dir of these 2 users would make sense. Regarding /run vs /var/run, given we also strive to support older GNU/Linux distributions such as CentOS/RHEL 5 and others, I would rather stay on /var/run for consistency. Unless there is a compelling reason to have a specific packaging path for cases where /run is available. But that would make maintenance of the packages more complex, so it has to be worth it.

          People

          • Assignee:
            Unassigned
            Reporter:
            Harsh J
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development