Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Duplicate
-
2.22.0
-
None
-
None
-
Maven 3.5.4
Surefire 2.22.0
Pax Exam 4.12.0
Description
In a setup with maven-surefire-plugin 2.22.0 with PAX Exam 4.12.0 I occasionally see that the Maven build takes about 30 seconds longer than expected. The source of this additional 30 seconds is a timeout used to eventually terminate a fork if it does not terminate by itself.
I have traced the cause of this failure to terminate to the Thread inside the fork receiving commands from the maven plugin (silently) terminating with an IOException. This is triggered by the validation logic inside MasterProcessCommand to check for commands without payload (NOOP, BYE_ACK) that there is actually no data send. If some other code reads data from the same `System.in` InputStream, the MasterProcessCommand.decode method may read data belonging to subsequent commands.
Relying on the assumption that the Surefire logic in the forked process is the only one reading from the 'shared resource' `System.in` makes it vulnerable to this corruption. I see that the catch clause of IOException in CommandReader also reports this, though I did not see the '[SUREFIRE] std/in stream corrupted' error in my runs.
Some of the solutions I thought of:
- Replacing the communication protocol (also mentioned here)
- Have the CommandRunnable hold an exclusive lock on `System.in` for the entire run() (wrapping the entire run with synchronized(System.in) {}) to ensure no other thread can read from the InputStream (as the individual read methods of BufferedInputStream used for System.in as synchronized as well). This is a risky move though, as I guess there is no guarantee that System.in is always implemented by BufferedInputStream.
I have prepared a project that demonstrates this behaviour: https://github.com/glimmerveen/fork-test . The timeout behaviour is visible as the test results are reported on the stdout of Maven, but the build process does not continue for another 30 seconds or so. Note that the corruption does not always happen.
Note that I reported this as blocking not necessarily because of the additional time it takes for the build to execute, but due to the fact that when the fork is terminated by the timeout, frameworks like JaCoCo don't get the opportunity to output their results.
Attachments
Issue Links
- duplicates
-
SUREFIRE-1658 TCP/IP Channel for forked Surefire JVM. Extensions API and SPI. Polymorphism for remote and local process communication.
- Closed