Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Invalid
    • Affects Version/s: 2.1.0
    • Fix Version/s: None
    • Component/s: Configuration
    • Labels:
      None

      Description

      If traffic server is started with non root user account the launch script endlessly loops in start attempts
      generating core.PID file on each iteration.
      This creates 80+ MB core file about each second until the disk gets full.

      The following log entry is added on each iteration:
      E. Mgmt] log ==> [TrafficManager] using root directory '/home/mturk/tmp/trafficserver-trunk/trunk-svn/release1/usr/local'
      [May 13 12:50:18.299]

      {3086546656} STATUS: opened var/log/trafficserver/manager.log
      [TrafficServer] using root directory '/home/mturk/tmp/trafficserver-trunk/trunk-svn/release1/usr/local'
      [May 13 12:50:20.830] {1074246896} STATUS: opened var/log/trafficserver/diags.log
      FATAL: Can't change group to user: nobody, gid: 99
      /home/mturk/tmp/trafficserver-trunk/trunk-svn/release1/usr/local/bin/traffic_server - STACK TRACE:
      /home/mturk/tmp/trafficserver-trunk/trunk-svn/release1/usr/local/bin/traffic_server(ink_fatal_va+0x8f)[0x83451c7]
      /home/mturk/tmp/trafficserver-trunk/trunk-svn/release1/usr/local/bin/traffic_server(ink_fatal_die+0x1d)[0x83451f7]
      /home/mturk/tmp/trafficserver-trunk/trunk-svn/release1/usr/local/bin/traffic_server(_Z14change_uid_gidPKc+0xd8)[0x8152a52]
      /home/mturk/tmp/trafficserver-trunk/trunk-svn/release1/usr/local/bin/traffic_server(main+0x1296)[0x8153e68]
      /lib/libc.so.6(__libc_start_main+0xdc)[0x7bee9c]
      /home/mturk/tmp/trafficserver-trunk/trunk-svn/release1/usr/local/bin/traffic_server[0x80f3b31]
      [May 13 12:50:21.176] Manager {3086546656}

      ERROR: [LocalManager::pollMgmtProcessServer] Server Process terminated due to Sig 6: Aborted
      [May 13 12:50:21.176] Manager

      {3086546656} ERROR: (last system error 2: No such file or directory)
      [May 13 12:50:21.176] Manager {3086546656}

      ERROR: [Alarms::signalAlarm] Server Process was reset
      [May 13 12:50:21.176] Manager

      {3086546656}

      ERROR: (last system error 2: No such file or directory)

        Issue Links

          Activity

          Hide
          James Peach added a comment -

          This patch is really old and doesn't apply any more. Current traffic_cop logs at FATAL level and exits when the getpwXXX() lookup for the admin user fails. I believe that this addresses the original problem.

          Show
          James Peach added a comment - This patch is really old and doesn't apply any more. Current traffic_cop logs at FATAL level and exits when the getpwXXX() lookup for the admin user fails. I believe that this addresses the original problem.
          Hide
          Zhao Yongming added a comment -

          we should make sure the admin user is there, the attached patch is a good proposal.

          Show
          Zhao Yongming added a comment - we should make sure the admin user is there, the attached patch is a good proposal.
          Hide
          Dan Mercer added a comment -

          The traffic_cop process already does a bunch of setup and sanity checks before spawning traffic_manager – including creating and chowning the lockfiles for traffic_manager,traffic_server to the designated admin user. This patch simply turns the condition where the admin user does not exist into a fatal error on startup, exiting traffic_cop and logging to COP_FATAL.

          Show
          Dan Mercer added a comment - The traffic_cop process already does a bunch of setup and sanity checks before spawning traffic_manager – including creating and chowning the lockfiles for traffic_manager,traffic_server to the designated admin user. This patch simply turns the condition where the admin user does not exist into a fatal error on startup, exiting traffic_cop and logging to COP_FATAL.
          Hide
          Leif Hedstrom added a comment -

          Moving this out to v2.3.0, unless someone is willing to work on this ?

          Show
          Leif Hedstrom added a comment - Moving this out to v2.3.0, unless someone is willing to work on this ?
          Hide
          Mladen Turk added a comment -

          We should make a way to tell both to the traffic_manager and traffic_cop that
          the application they manage (traffic_manager->traffic_server and
          traffic_cop->traffic_manager) is in the unrecoverable state.

          Currently the child process is automatically restarted (with or without delay)
          without considering the cause of child process death.
          This causes miss-configured child application endlessly restarted without the
          chance to propagate that cause to the monitor application which should exit
          in such cases as well.

          I presume that restart mechanism is used to cover transient memory errors
          and reconfiguration restarts.

          The easiest solution would be that parent considers the child's exit value, and
          on some predetermined defined number stops respawning it's child.
          This would however require to carefully select fatal errors and process exit
          values which are now pretty heuristic,

          Show
          Mladen Turk added a comment - We should make a way to tell both to the traffic_manager and traffic_cop that the application they manage (traffic_manager->traffic_server and traffic_cop->traffic_manager) is in the unrecoverable state. Currently the child process is automatically restarted (with or without delay) without considering the cause of child process death. This causes miss-configured child application endlessly restarted without the chance to propagate that cause to the monitor application which should exit in such cases as well. I presume that restart mechanism is used to cover transient memory errors and reconfiguration restarts. The easiest solution would be that parent considers the child's exit value, and on some predetermined defined number stops respawning it's child. This would however require to carefully select fatal errors and process exit values which are now pretty heuristic,
          Hide
          Mladen Turk added a comment -

          OK. Lowering down to critical not because there is a workaround, but because it's a known issue

          However blindly restarting server from manager in endless loop in case of unrecoverable errors
          will have to be fixed. Also still see no point to core dump in case the user is not known or accessible.

          Show
          Mladen Turk added a comment - OK. Lowering down to critical not because there is a workaround, but because it's a known issue However blindly restarting server from manager in endless loop in case of unrecoverable errors will have to be fixed. Also still see no point to core dump in case the user is not known or accessible.
          Hide
          George Paul added a comment -

          Note. I also tested the configure options from TS-336 below and everything worked as expected:

          ./configure --program-prefix= \
          --prefix=/usr \
          --exec-prefix=/usr \
          --bindir=/usr/bin \
          --sbindir=/usr/sbin \
          --sysconfdir=/etc \
          --datadir=/usr/share \
          --includedir=/usr/include \
          --libdir=/usr/lib \
          --libexecdir=/usr/libexec \
          --localstatedir=/var \
          --sharedstatedir=/usr/com \
          --mandir=/usr/share/man \
          --infodir=/usr/share/info \
          --with-sqlite3=no \
          --with-libdb=yes \
          --with-user=$(whoami) \
          --with-group=$(whoami) \
          --htmldir=/usr/share

          make
          make install DESTDIR=/tmp/testbuild

          env TS_ROOT=/tmp/testbuild /tmp/testbuild/usr/bin/trafficserver start

          Show
          George Paul added a comment - Note. I also tested the configure options from TS-336 below and everything worked as expected: ./configure --program-prefix= \ --prefix=/usr \ --exec-prefix=/usr \ --bindir=/usr/bin \ --sbindir=/usr/sbin \ --sysconfdir=/etc \ --datadir=/usr/share \ --includedir=/usr/include \ --libdir=/usr/lib \ --libexecdir=/usr/libexec \ --localstatedir=/var \ --sharedstatedir=/usr/com \ --mandir=/usr/share/man \ --infodir=/usr/share/info \ --with-sqlite3=no \ --with-libdb=yes \ --with-user=$(whoami) \ --with-group=$(whoami) \ --htmldir=/usr/share make make install DESTDIR=/tmp/testbuild env TS_ROOT=/tmp/testbuild /tmp/testbuild/usr/bin/trafficserver start
          Hide
          John Plevyak added a comment -

          Let's add a note to the FAQ to turn off cores to avoid the worst of this problem.

          Show
          John Plevyak added a comment - Let's add a note to the FAQ to turn off cores to avoid the worst of this problem.
          Hide
          George Paul added a comment -

          The problem you described is a bug and also exists in 2.0.x. In the case where the user/group does not exist so we provided a workaround in TS-15. Since this exists in current 2.0.0 release and there is a workaround IMHO I don't believe this is a blocker for 2.1.0.

          Show
          George Paul added a comment - The problem you described is a bug and also exists in 2.0.x. In the case where the user/group does not exist so we provided a workaround in TS-15 . Since this exists in current 2.0.0 release and there is a workaround IMHO I don't believe this is a blocker for 2.1.0.
          Hide
          Mladen Turk added a comment -

          And yes, I had to manually edit the config and enter absolute dirs,
          cause it won't start outside /usr/local which is PITA from dev point of view.

          Show
          Mladen Turk added a comment - And yes, I had to manually edit the config and enter absolute dirs, cause it won't start outside /usr/local which is PITA from dev point of view.
          Hide
          Mladen Turk added a comment -

          Like said 'started with non root user account'

          So

          $ ./configure --without-sqlite3 --with-libdb
          $ make
          $ make install DESTDIR=`pwd`/release1
          $ cd release1/usr/local
          $ export TS_ROOT=`pwd`
          $ cd bin
          $ ./trafficserver start

          Not a proper way I know, but it shows that if user cannot be found or seteuid fails, server creates a core file of cca 80M.
          traffic_manager endlessly tries to respawn the server until either disk or quota is exhausted.

          Show
          Mladen Turk added a comment - Like said 'started with non root user account' So $ ./configure --without-sqlite3 --with-libdb $ make $ make install DESTDIR=`pwd`/release1 $ cd release1/usr/local $ export TS_ROOT=`pwd` $ cd bin $ ./trafficserver start Not a proper way I know, but it shows that if user cannot be found or seteuid fails, server creates a core file of cca 80M. traffic_manager endlessly tries to respawn the server until either disk or quota is exhausted.
          Hide
          George Paul added a comment -

          How was this ATS built i.e. plain './configure' or did you configure as a particular valid user and valid group to run i.e. './configure --with-user=<user> --with-group=<group>' ?

          How did you install the build i.e. 'sudo make install', 'make DESTDIR=<destdir> install' , etc?

          Also how did you start the ATS stack?
          sudo /usr/local/bin/trafficserver start ?
          sudo -u nobody /usr/local/bin/trafficserver start ?
          /usr/local/bin/trafficserver start ?
          env TS_ROOT=<destdir> <destdir>/usr/local/bin/trafficserver start?
          etc....

          Show
          George Paul added a comment - How was this ATS built i.e. plain './configure' or did you configure as a particular valid user and valid group to run i.e. './configure --with-user=<user> --with-group=<group>' ? How did you install the build i.e. 'sudo make install', 'make DESTDIR=<destdir> install' , etc? Also how did you start the ATS stack? sudo /usr/local/bin/trafficserver start ? sudo -u nobody /usr/local/bin/trafficserver start ? /usr/local/bin/trafficserver start ? env TS_ROOT=<destdir> <destdir>/usr/local/bin/trafficserver start? etc....

            People

            • Assignee:
              James Peach
              Reporter:
              Mladen Turk
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development