History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: STDCXX-536
Type: Improvement Improvement
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Travis Vitek
Reporter: Martin Sebor
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
C++ Standard Library

allow thread safety tests to time out without failing

Created: 24/Aug/07 06:44 PM   Updated: 29/May/08 04:37 PM
Component/s: Tests
Affects Version/s: 4.2.0
Fix Version/s: 4.2.2

Time Tracking:
Original Estimate: 3h
Original Estimate - 3h
Remaining Estimate: 0h
Time Spent - 5h
Time Spent: 5h
Time Spent - 5h

File Attachments:
  Size
Text File Licensed for inclusion in ASF works stdcxx-536.patch 2008-05-07 01:45 PM Travis Vitek 49 kb

Severity: Usability


 Description  « Hide
The newly added thread safety tests (and possibly some of the existing ones) tend to run for a long time, consuming a lot of CPU cycles, and sometimes even failing due to a timeout (currently 300 seconds in nightly builds). It would be useful to provide a mechanism such as a command line option whereby the tests' runtime could be limited without necessarily causing them to fail when the amount of time is exceeded. One way to do it would be for each test to set an alarm in response to this command line option and in handler for the alarm set a flag that each thread would check at each iteration of its loop to see if it should break.

 All   Comments   Work Log   Change History   Subversion Commits   FishEye      Sort Order: Ascending order - Click to sort in descending order
Martin Sebor - 07/Sep/07 07:52 AM
Travis, can you take a look at this to see if we could implement it for 4.2 and avoid the test timeouts?

Martin Sebor - 07/Sep/07 07:53 AM
Tentatively scheduled for 4.2.

Travis Vitek - 10/Sep/07 05:06 PM - edited
2007-09-10  Travis Vitek  <vitek@roguewave.com>

	STDCXX-536
	rw_timeout.h: New file declares timeout functions used by
	tests.
	opt_timeout.h: New file declares timeout functions used by
	the test suite driver.
	opt_timeout.cpp: New file defines timeout functions used by
	the test suite driver.
	driver.cpp (rw_vtest): setup new command line options,
	enable and disable alarm for timeout.
	22.locale.messages.mt.cpp (thread_func): use timeout to
	let test threads exit early.

Travis Vitek - 10/Sep/07 05:32 PM

I'm not convinced that this is actually a smart thing to do. Tests that would noramlly time out will now exit successfully. This is probably okay if there is only one part to the entire test, but that is not the case with any of the multithreaded tests that I've seen. If a test has three parts, and the first part times out, parts two and three won't execute, but it will appear that the test ran just fine.

That, and the new code doesn't provide anything for a test that doesn't poll the rw_timeout_expired() function. So this code is only really useful for the multithreaded tests. If that is the case then maybe the functionality thould be isolated to just those tests.


Travis Vitek - 10/Sep/07 05:36 PM
fixed patch to not use rw_note when timer expires because that function is not mt-safe.


Travis Vitek - 14/Sep/07 09:56 AM
Rescheduled for 4.2.1

Martin Sebor - 31/Oct/07 05:09 PM
Now that 4.2.0 is released, set Affects Version(s) accordingly.

Martin Sebor - 03/Jan/08 11:35 AM
Set Severity to Usability and checked Patch Available.

Martin Sebor - 03/Jan/08 11:43 AM
I'm not sure where we left this issue. The discussion kind of petered out... Do we want the patch or not?

Travis Vitek - 03/Jan/08 12:04 PM
The patch is not ready for submission. I think we need to pick up discussion on this issue again.

Martin Sebor - 21/Jan/08 11:11 AM
Unchecked Patch Available since it's not ready.

Travis Vitek - 29/Feb/08 02:25 PM
We need to make some decisions here about how we want this to work. The options that I see are
  1. If a test hits the hard timeout it fails. There is no soft timeout. This is the current behavior.
  2. The number of iterations for blocks of each test will be reduced as needed to execute before the hard timeout.
  3. Tests will be split into multiple tests, each with its own hard timeout. May still require tweaking to iteration count.
  4. Tests will be split into multiple tests, each of which can have their own soft timeout that would allow it to exit just before the hard timeout.
  5. The test itself will specify a block of code that is to be executed under the soft timeout by invoking a function at the begin and end of that block. That function will take a period and a callback function [similar to alarm()]

Any other ideas or comments?


Travis Vitek - 26/Mar/08 10:09 AM
Defer to 4.3 or later. I don't recall seeing any discussion on this issue, and that discussion is necessary before we start implementing anything.

Travis Vitek - 07/May/08 01:45 PM - edited
Proposed patch allows each call to rw_thread_pool() to specify a soft timeout in seconds. A non-zero timeout will cause a timer to be started when pool threads are spawned. When the timer expires, a flag will be set. The user thread function will need to poll this flag periodically through the function rw_thread_timeout_expired(), and if it is true the thread should return as soon as is reasonably possible.

Please review and provide feedback.

2008-05-07  Travis Vitek  <vitek@roguewave.com>

	STDCXX-536
	* tests/include/rw_thread.h: Reduce C linkage to areas that
	actually need it, add typedef for thread function type.
	(rw_thread_pool): Add timeout parameter with default value.
	(rw_thread_pool_timeout_expired): New function declaration.
	* tests/src/thread.cpp: (_rw_timeout_handler): New function
	to respond to alarm signals and set timeout flag.
	(rw_thread_pool_timeout_expired): New function to query
	timeout status.
	(rw_thread_pool): Set alarm based on timeout value.
	* tests/localization/22.locale.codecvt.mt.cpp: Add timeout
	option with default value. Pass timeout to rw_thread_pool().
	Poll rw_thread_pool_time_expired() in thread function and
	respond. Removed rw_ prefix used by test driver symbols.
	* tests/localization/22.locale.cons.mt.cpp: Ditto.
	* tests/localization/22.locale.ctype.mt.cpp: Ditto.
	* tests/localization/22.locale.globals.mt.cpp: Ditto.
	* tests/localization/22.locale.messages.mt.cpp: Ditto.
	* tests/localization/22.locale.money.get.mt.cpp: Ditto.
	* tests/localization/22.locale.money.put.mt.cpp: Ditto.
	* tests/localization/22.locale.moneypunct.mt.cpp: Ditto.
	* tests/localization/22.locale.num.get.mt.cpp: Ditto.
	* tests/localization/22.locale.num.put.mt.cpp: Ditto.
	* tests/localization/22.locale.numpunct.mt.cpp: Ditto.
	* tests/localization/22.locale.statics.mt.cpp: Ditto.
	* tests/localization/22.locale.time.get.mt.cpp: Ditto.
	* tests/localization/22.locale.time.put.mt.cpp: Ditto.

Martin Sebor - 07/May/08 09:03 PM
One request: can you separate out changes that are unrelated to the timeout enhancement to make this patch easier to review?

Other than that, a few questions/concerns:

  1. Is alarm() safe to use with multiple threads? Would having the main thread sleep until the timeout expires and then set the flag be a safer and simpler approach?
  2. Should the same timeout mechanism as in rw_thread_pool() be also used in rw_thread_create()?
  3. Is the overhead of calling a function in each iteration of the thread loop to check the timeout flag significant enough to consider exposing the flag itself? (E.g., as a const volatile sig_atomic_t &rw_thread_timeout initialized to refer to the actual, non-const flag, to prevent thread functions from modifying it while allowing the the driver to (indirectly) set it.)

Travis Vitek - 08/May/08 10:02 AM - edited
Okay, I submitted an initial patch to remove the rw_ prefix from test local option variables. See r654561. I'll submit a new patch when we make some decisions about what to do with the issues you just brought up.

As for those issues...

  1. From memory the only issue with using alarm() in a threaded program is that you have no control over which thread will get the signal. This can be a problem is some threads in your program ignore signals, and the signal is delivered to one of those threads.
  2. I have no problem with that provided we can decide how we want to solve #1 above.
  3. I considered doing something like this, but it seemed like unnecessary optimization. As for the type of the timeout flag, it should probably be changed to volatile sig_atomic_t if we are going to use alarm().

Having the main thread sleep for the timeout period is not really a good solution. If the timeout is 60 seconds, but the threads complete in 5, the main thread would waste 55 seconds waiting for the timeout to expire. It also has the problem that it won't work with single threaded builds.

I have considered several possible implementations, and I don't see many options that do everything we would like. The two big issues are single threaded builds and being able to use the timeout with threads that are not explicitly joined by the thread creation function. The second issue is not critical, but it seems a bit kludgey to not support timeouts consistently regardless of what parameters you pass to rw_thead_pool().

One option would be to pass some context data to the thread and let the thread decide if it is time to exit on its own. The context could include the 'drop dead time' for the thread and then the thread could check the current time against that. Unfortunately that moves more logic into thread_func and rw_thread_pool_timeout_expired(). Another option would be to build a timer queue and . This would require that the test library always be built with threads so that we could spawn a thread to service the timers.


Martin Sebor - 08/May/08 10:07 PM
I was hoping there would be a way to wake up the sleeping thread when the threads were done but it doesn't look like Pthreads has a feature like that (pthreads_timed_join()?) I'm just a little nervous about the interactions between signals and threads (POSIX Signal Concepts describes some of the intricacies). And then I guess the purist in me would prefer a pthreads-only solution if there is one. I wonder if we could emulate pthreads_timed_join() by wrapping the thr_proc in our own function and having it signal (via pthread_cond_signal() the sleeping thread (in a call to pthread_cond_timedwait()) just before exiting... Or is it overdesigning it? In any case, if you're happy with the alarm() approach I'm okay with it.

Travis Vitek - 09/May/08 12:41 AM
Anything using pthreads stuff will have portability issues [er, windows] that will have to be addressed. We'll also have to decide what to do for single threaded builds. If we don't link the thread library in single threaded builds then we'll have trouble using pthread_cond_timedwait().

Of the 9 pages of information about signals, what issues are you concerned about. Please be specific.


Martin Sebor - 09/May/08 09:01 AM
I don't have any specific issues in mind that should cause problems in our case. It's possible that the alarm() solution will be perfectly safe in the limited set of situations that we have to worry about, even if it isn't robust enough to be completely general. But experience has taught me that programs that mix threads and signals do tend to run into unanticipated problems, often due to portability issues, so I'm simply trying to see if there's a better, more general alternative. If not I guess we'll just have to trust we'll be fine and not push it too far in the future (e.g., when we implement the C++ Multi-threading library).

Eric Lemings - 09/May/08 09:21 AM - edited
I agree with Martin's concerns on mixing threads and signals. I would suggest using a messaging framework; it would be safer but would require even more infrastructure. As a starter for resolving this issue however, we should try Travis' approach since it most likely better than the current paradigm.

Travis Vitek - 11/May/08 02:07 PM
Then this question remains... How does such messaging framework work in a single threaded build? Does the test function need to call some messaging_framework_update() type function to make sure that any messages get processed or do we use a thread inside the test library that dispatches messages?

The simplest and probably most portable solution is to just add some context data that tells the thread how long it should run [say a start and end time]. Then the thread_func would just check to see if the current time was outside that period and bail.


Travis Vitek - 13/May/08 11:23 AM
Committed to 4.2.x in r655960.

Travis Vitek - 29/May/08 04:37 PM
Resolving. If someone really wants to discuss using another method we should open another issue.