Log4cxx
  1. Log4cxx
  2. LOGCXX-246

Config refresh hangs a client application that uses TelnetAppender

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.10.0
    • Component/s: Appender
    • Labels:
      None
    • Environment:
      Windows Server 2003

      Description

      Some time ago, we spent some time migrating to the new 0.10 pre-release code-base. Upon checkout and build of Revision 582102, we found that there is a problem with the TelnetAppender.

      It appears that if you have a DOMConfigurator::configureAndWatch setup and your config file has a TelnetAppender, the initial configuration runs fine. However, if you change the file, then the platform tries to reload the config file and the process stops responding before it can reinit the TelnetAppender. It just hangs.

      However, we have been able to make it work in, what we think is, a satisfactory manner.

      The relevant patch is attached and has been tested to run fine on Windows 2003 platform. The patch is simple and basically performs what
      has been described below:

      1. It closes all the sockets, first.
      2. The closing of a socket interrupt the blocking accept with an
      exception.
      3. When an exception is caught in "acceptConnections", pThis->closed
      is checked and if set to true, the thread loop is exited.
      4. The main thread, in the meanwhile, is waiting for this thread to join.
      5. Once the join happens, we are done.

      I would again like to stress again that the previous code which calls
      "Thread.stop()" ultimately stops the thread that makes that call, not
      the thread that was spawned (at least on Win32). This has very critical
      results for an operation as simple as a config reload. For this reason,
      I believe this bug should be a blocker for the 0.10 release.

      PS: As far as I can see, there could be a similar problem in
      "FileWatchdog::~FileWatchdog()" which calls "thread.stop" to stop the
      thread that it has spawned. The right thing might be to set the
      "interrupted" flag to 1 and wait for the thread to join again. However,
      since the destruction of FileWatchDog happens at application shutdown
      for us, it has not been a critical issue. For other applications, it
      might be critical.

      1. telnetappender.patch
        0.8 kB
        Ufuk Kayserilioglu
      2. telnet.patch
        0.4 kB
        Ufuk Kayserilioglu

        Activity

        Hide
        Ufuk Kayserilioglu added a comment -

        Patch against revision 582102.

        Show
        Ufuk Kayserilioglu added a comment - Patch against revision 582102.
        Hide
        Ufuk Kayserilioglu added a comment -

        Let me try to answer some of the question that you raised in the User mailing list:

        Q: How similar are the log4j and log4cxx implementations?
        A: They are very similar at the moment. Actually for the bit of code that is causing the problem they are exactly the same.

        Q: If they are similar is it a flaw in both implementations, or is it a flaw in the emulation of java.lang.Thread?
        A: It is a flaw in the emulation of java.lang.Thread.

        Rather, the emulation is OK, but the key point is that there is no way to terminate/stop a Win32 thread apart from calling TerminateThread which is strongly advised against by MSDN documentation. The only safe way to stop a thread is to signal it to stop itself. The Thread class of log4cxx implements Thread::stop by calling "apr_thread_exit((apr_thread_t*) thread, 0)" with the expectation that it will stop the thread that was launched. However that APR function just calls "_endthreadex(0)" which amounts to exiting the thread that the "stop" call is being made on.

        Q: How to write a test case that would identify the issue?
        A: The test case is very simple. Just write a Win32 console program that logs something indefinitely and configure it using "configureandwatch" to append to a TelnetAppender and a ConsoleAppender. While it is doing what it is doing, just change the TelnetAppender port number or something in the config file and let it reload. If you do this on a Win32 platform you will see that you apllication will hang (provided you called "configureandwatch" from the main() function).

        I hope these help.

        Show
        Ufuk Kayserilioglu added a comment - Let me try to answer some of the question that you raised in the User mailing list: Q: How similar are the log4j and log4cxx implementations? A: They are very similar at the moment. Actually for the bit of code that is causing the problem they are exactly the same. Q: If they are similar is it a flaw in both implementations, or is it a flaw in the emulation of java.lang.Thread? A: It is a flaw in the emulation of java.lang.Thread. Rather, the emulation is OK, but the key point is that there is no way to terminate/stop a Win32 thread apart from calling TerminateThread which is strongly advised against by MSDN documentation. The only safe way to stop a thread is to signal it to stop itself. The Thread class of log4cxx implements Thread::stop by calling "apr_thread_exit((apr_thread_t*) thread, 0)" with the expectation that it will stop the thread that was launched. However that APR function just calls "_endthreadex(0)" which amounts to exiting the thread that the "stop" call is being made on. Q: How to write a test case that would identify the issue? A: The test case is very simple. Just write a Win32 console program that logs something indefinitely and configure it using "configureandwatch" to append to a TelnetAppender and a ConsoleAppender. While it is doing what it is doing, just change the TelnetAppender port number or something in the config file and let it reload. If you do this on a Win32 platform you will see that you apllication will hang (provided you called "configureandwatch" from the main() function). I hope these help.
        Hide
        Ufuk Kayserilioglu added a comment -

        Protection around sh.join()

        Show
        Ufuk Kayserilioglu added a comment - Protection around sh.join()
        Hide
        Ufuk Kayserilioglu added a comment -

        I have realized that, "Thread::join()" raises a ThreadException if "apr_thread_join" does not return success. After this fix for TelnetAppender, the "sh.join()" call started raising this exception when called during program shutdown. "apr_thread_join" seems to return 70008 in this situation. This prevents the client program from shutting down cleanly.

        For this reason I have added a try/catch around the "sh.join()" call.

        Show
        Ufuk Kayserilioglu added a comment - I have realized that, "Thread::join()" raises a ThreadException if "apr_thread_join" does not return success. After this fix for TelnetAppender, the "sh.join()" call started raising this exception when called during program shutdown. "apr_thread_join" seems to return 70008 in this situation. This prevents the client program from shutting down cleanly. For this reason I have added a try/catch around the "sh.join()" call.

          People

          • Assignee:
            Curt Arnold
            Reporter:
            Ufuk Kayserilioglu
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 10m
              10m
              Remaining:
              Remaining Estimate - 10m
              10m
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development