Issue Details (XML | Word | Printable)

Key: STDCXX-683
Type: New Feature New Feature
Status: Open Open
Priority: Critical Critical
Assignee: Martin Sebor
Reporter: Martin Sebor
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
C++ Standard Library

implement notion of expected failures in the test suite

Created: 18/Dec/07 11:58 PM   Updated: 12/Sep/08 12:33 AM
Return to search
Component/s: Test Driver, Test Harness
Affects Version/s: 4.2.0
Fix Version/s: 4.2.2

Time Tracking:
Original Estimate: 24h
Original Estimate - 24h
Remaining Estimate: 20h
Time Spent - 50h Remaining Estimate - 20h
Time Spent: 50h
Time Spent - 50h Remaining Estimate - 20h

File Attachments:
  Size
HTML File Licensed for inclusion in ASF works codes.html 2007-12-20 01:47 AM Martin Sebor 9 kB
HTML File Licensed for inclusion in ASF works xcodes.html 2007-12-20 06:07 PM Martin Sebor 10 kB
Issue Links:
dependent
 

Severity: Usability


 Description  « Hide
Tests (or examples) that fail for known reasons that we haven't been able to deal with should be distinguished from failures that haven't been analyzed yet. For example, an example program that fails to compile on an older target platform because of a compiler bug that we can't find a simple/elegant workaround should be flagged as such in the test results. Similarly, a test that fails one or more assertions due to compiler or libc bugs on a specific platform (or a set of platforms) that we are unable to work around should be reported as such.

This is important in order to reduce the currently fairly large number of unexpected failures and to be able to make changes without having to worry about regressions as much.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Martin Sebor added a comment - 20/Dec/07 01:13 AM
Attached the current set of status codes used to report test results. Implementing the "expected failures" enhancement will mean extending the set of codes to distinguish ordinary (unexpected) failures from the expected ones.

Martin Sebor added a comment - 20/Dec/07 01:47 AM
Attached a proposed set of extended status codes to distinguish expected failures from ordinary (unexpected) ones.

Martin Sebor added a comment - 20/Dec/07 01:48 AM
This time really attached the proposed set of extended status codes to distinguish expected failures from ordinary (unexpected) ones.

Martin Sebor added a comment - 20/Dec/07 02:15 AM
Travis and I discussed this enhancement a few weeks back and here are the main goals we came up with:

1. It must be possible to mark up any failure in any component of the build and test process (i.e., any example, locale, or test) as an expected failure. This includes failure to compile, failure to link, exiting with a signal, exiting with a non-zero status, failure to produce the expected output (i.e., the DIFF status for examples), and individual failed assertions (in tests).

2. For each such markup/failure, it must be possible to differentiate between the same set of platforms and configurations of the library that are distinguished by the current build and test reporting system. That a mechanism must be provided to make it possible to specify that a particular failure (for example, a failure to compile) is expected to occur only on a specific platform (e.g., with IBM XLC++ 7.0.0.9 on AIX 5.3) or a set of platforms (e.g., with all versions of gcc prior to 3.2.4, or with any compiler on Solaris), and only in a particular configuration of the library (such as 12D, or optimized, shared, thread-safe, wide or 64-bit) or a set of such configurations (e.g., all 64-bit ones).

3. For tests, it must be possible to mark up individual failed assertions (and any other types of diagnostics) as expected using their sequential numbers (assigned to them by the test driver) at runtime.


Martin Sebor added a comment - 20/Dec/07 06:04 PM - edited
4. In addition to the requirements listed above, the system itself (as opposed to the user) must indicate when a successful outcome is not expected. I.e., when a component such as a test is expected to fail but succeeds it must be highlighted as such so as to distinguish it from an ordinary success and make it easy to remove the "expected failure" markup.

Martin Sebor added a comment - 20/Dec/07 06:07 PM
Attached an updated set of extended status codes to distinguish not only expected failures from ordinary (unexpected) ones, but also unexpected successes from ordinary (expected) ones.

Martin Sebor added a comment - 20/Dec/07 10:55 PM
Requirements on Markups

1. Location.

The markups must be in the form of human-readable and editable text stored in a location that can be easily and intuitively associated with each component (currently, example, locale, or test). One obvious location is the source code for each component itself. There, the markups could take the form of comments that could be easily found by the test harness. Another convenient location is a separate file with the same base name but a different suffix than the component itself. Yet another possibility is storing all markups in a single text file and with the name of the component as the key. The implementation should be such so as to make it easy to switch from one location to the next if it turns out to be convenient.

2. Format.

The format of the markups must be easy to read and write and make it possible to easily express precise constraints involving the operating system and its version, the compiler and its version, the library configuration, and the expected status. It must be possible to set more than one constraint for each component, and it must be possible for a single constraint to refer to more than one platform or configuration.

One possible format is to use relational operators and boolean logic. For example, to refer to XLC++ 7.0.0.9 on AIX 5.3 and prior, the expression might look something like this: os==AIX && (os_major<5 || os_major==5 && os_minor<=3) && compiler==XLC && compiler_major==7 && compiler_minor==0 && compiler_micro==0 && compiler_patch==9.

Another possible format is to adopt a conveniton similar to the GNU cpu-vendor-os triple produced by config.guess (an example of such a triple is i386-redhat-linux or sparc-sun-solaris2.9). In our case, the GNU convention would need to be modified and extended to include the compiler and the library configuration and might look something like this: cpu-vendor-os-compiler-configuration. We could then use shell globbing to implement matching. For example, the following two patterns would have to be used at the same time in order to refer to a 15D configuration of the library build with XLC++ 7.0.0.9 on AIX 5.3 and prior: -ibm-aix5.[0-3]-xlc7.0.0.9-15D *-ibm-aix[1-4].-xlc7.0.0.9-15D The leading asterisk indicates no preference for the CPU component.


Andrew Black added a comment - 20/Dec/07 11:14 PM
Looking at the revised xcodes.html file, I think it will be necessary to add an 'XNOUT' (expected NOUT) and 'XFMAT' (expected FMAT) assertion code. This is because we currently have tests which currently produce NOUT and FMAT status messages, and these messages are expected. Examples include NOUT messages from many regression tests (like 18.limits.stdcxx-436) and FORMAT (FMAT) messages from some driver tests (0.cmdopts, 0.strncmp, and 0.valcmp).

Martin Sebor added a comment - 21/Dec/07 04:24 AM - edited
I left these out on purpose, but on second thought I agree that XFMAT should be added. I don't think XNOUT makes sense because NOUT is an expected state for regression tests and would be unexpected for any other kind.

Andrew Black added a comment - 21/Dec/07 03:38 PM
I think that a distinction between expected and unexpected NOUT states should be highlighted. This would permit us to easily spot non-regression tests which failed to produce any output. Without an XNOUT state, it would be necessary to check if each test is a regression test before looking past an NOUT row. In some cases, the regression notation (.stdcxx-NNN) may be trimmed, due to the length of the test name, making it harder to determine if a test is a regression test. Perhaps it would make more sense to use XNOUT for unexpected NOUT results and give it a yellow or red background.

An unrelated thought I'm having is that it might make sense to check how accessible the color scheme is to someone who is red/green color blind. One tool I tracked down that can be used is http://colorfilter.wickline.org/ . The observation I'm having is that the dark green background used for the OK/XPASS states appears the same as the background used for the [SIG]<name> state.


Martin Sebor added a comment - 21/Jan/08 05:39 PM
Set the number of remaining hours to 24.

Martin Sebor added a comment - 19/Feb/08 09:44 PM
This Improvement affects the Test Driver as well as the Test Harness, but not individual tests. Set Component accordingly.

Martin Sebor added a comment - 06/Apr/08 10:23 PM
Guessing at the remaining work: implement expected failures for test assertions.

Martin Sebor added a comment - 06/Apr/08 10:24 PM
We'll need this completed well before 4.3.

Martin Sebor added a comment - 06/Apr/08 10:31 PM
See also STDCXX-702 where refactored things between the xbuildgen and xcomp.awk scripts.

Martin Sebor added a comment - 06/Apr/08 11:26 PM
Since this is needed before 4.3 rescheduled for the tentative 4.2.2 patch release rather than 4.3. If we decide not to do 4.2.2 we'll change it back.

Farid Zaripov added a comment - 17/Apr/08 10:30 AM
r606100 merged in 4.2.x branch thus: http://svn.apache.org/viewvc?view=rev&revision=648752

The other changes in bin directory and etc/config/xfail.txt are still not merged.