|
[
Permlink
| « Hide
]
Martin Sebor added a comment - 20/Dec/07 01:13 AM
Attached the current set of status codes used to report test results. Implementing the "expected failures" enhancement will mean extending the set of codes to distinguish ordinary (unexpected) failures from the expected ones.
Attached a proposed set of extended status codes to distinguish expected failures from ordinary (unexpected) ones.
This time really attached the proposed set of extended status codes to distinguish expected failures from ordinary (unexpected) ones.
Travis and I discussed this enhancement a few weeks back and here are the main goals we came up with:
1. It must be possible to mark up any failure in any component of the build and test process (i.e., any example, locale, or test) as an expected failure. This includes failure to compile, failure to link, exiting with a signal, exiting with a non-zero status, failure to produce the expected output (i.e., the DIFF status for examples), and individual failed assertions (in tests). 2. For each such markup/failure, it must be possible to differentiate between the same set of platforms and configurations of the library that are distinguished by the current build and test reporting system. That a mechanism must be provided to make it possible to specify that a particular failure (for example, a failure to compile) is expected to occur only on a specific platform (e.g., with IBM XLC++ 7.0.0.9 on AIX 5.3) or a set of platforms (e.g., with all versions of gcc prior to 3.2.4, or with any compiler on Solaris), and only in a particular configuration of the library (such as 12D, or optimized, shared, thread-safe, wide or 64-bit) or a set of such configurations (e.g., all 64-bit ones). 3. For tests, it must be possible to mark up individual failed assertions (and any other types of diagnostics) as expected using their sequential numbers (assigned to them by the test driver) at runtime. 4. In addition to the requirements listed above, the system itself (as opposed to the user) must indicate when a successful outcome is not expected. I.e., when a component such as a test is expected to fail but succeeds it must be highlighted as such so as to distinguish it from an ordinary success and make it easy to remove the "expected failure" markup.
Attached an updated set of extended status codes to distinguish not only expected failures from ordinary (unexpected) ones, but also unexpected successes from ordinary (expected) ones.
Requirements on Markups
1. Location. The markups must be in the form of human-readable and editable text stored in a location that can be easily and intuitively associated with each component (currently, example, locale, or test). One obvious location is the source code for each component itself. There, the markups could take the form of comments that could be easily found by the test harness. Another convenient location is a separate file with the same base name but a different suffix than the component itself. Yet another possibility is storing all markups in a single text file and with the name of the component as the key. The implementation should be such so as to make it easy to switch from one location to the next if it turns out to be convenient. 2. Format. The format of the markups must be easy to read and write and make it possible to easily express precise constraints involving the operating system and its version, the compiler and its version, the library configuration, and the expected status. It must be possible to set more than one constraint for each component, and it must be possible for a single constraint to refer to more than one platform or configuration. One possible format is to use relational operators and boolean logic. For example, to refer to XLC++ 7.0.0.9 on AIX 5.3 and prior, the expression might look something like this: os==AIX && (os_major<5 || os_major==5 && os_minor<=3) && compiler==XLC && compiler_major==7 && compiler_minor==0 && compiler_micro==0 && compiler_patch==9. Another possible format is to adopt a conveniton similar to the GNU cpu-vendor-os triple produced by config.guess (an example of such a triple is i386-redhat-linux or sparc-sun-solaris2.9). In our case, the GNU convention would need to be modified and extended to include the compiler and the library configuration and might look something like this: cpu-vendor-os-compiler-configuration. We could then use shell globbing to implement matching. For example, the following two patterns would have to be used at the same time in order to refer to a 15D configuration of the library build with XLC++ 7.0.0.9 on AIX 5.3 and prior: -ibm-aix5.[0-3]-xlc7.0.0.9-15D *-ibm-aix[1-4].-xlc7.0.0.9-15D The leading asterisk indicates no preference for the CPU component. Looking at the revised xcodes.html file, I think it will be necessary to add an 'XNOUT' (expected NOUT) and 'XFMAT' (expected FMAT) assertion code. This is because we currently have tests which currently produce NOUT and FMAT status messages, and these messages are expected. Examples include NOUT messages from many regression tests (like 18.limits.stdcxx-436) and FORMAT (FMAT) messages from some driver tests (0.cmdopts, 0.strncmp, and 0.valcmp).
I left these out on purpose, but on second thought I agree that XFMAT should be added. I don't think XNOUT makes sense because NOUT is an expected state for regression tests and would be unexpected for any other kind.
I think that a distinction between expected and unexpected NOUT states should be highlighted. This would permit us to easily spot non-regression tests which failed to produce any output. Without an XNOUT state, it would be necessary to check if each test is a regression test before looking past an NOUT row. In some cases, the regression notation (.stdcxx-NNN) may be trimmed, due to the length of the test name, making it harder to determine if a test is a regression test. Perhaps it would make more sense to use XNOUT for unexpected NOUT results and give it a yellow or red background.
An unrelated thought I'm having is that it might make sense to check how accessible the color scheme is to someone who is red/green color blind. One tool I tracked down that can be used is http://colorfilter.wickline.org/ Set the number of remaining hours to 24.
This Improvement affects the Test Driver as well as the Test Harness, but not individual tests. Set Component accordingly.
Guessing at the remaining work: implement expected failures for test assertions.
We'll need this completed well before 4.3.
See also
Since this is needed before 4.3 rescheduled for the tentative 4.2.2 patch release rather than 4.3. If we decide not to do 4.2.2 we'll change it back.
r606100 merged in 4.2.x branch thus: http://svn.apache.org/viewvc?view=rev&revision=648752
The other changes in bin directory and etc/config/xfail.txt are still not merged. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||