Uploaded image for project: 'Maven Surefire'
  1. Maven Surefire
  2. SUREFIRE-1244

NumberFormatException in parallel test run with runOrder = failedFirst

    Details

      Description

      Occasionally in our Jenkins build, I get an error in test reporting with a backtrace like this:

      13:28:00 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project service: ExecutionException: java.lang.NumberFormatException: For input string: "pClientWrapperTest)" -> [Help 1]
      13:28:00 org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project service: ExecutionException
      13:28:00 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213)
      13:28:00 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
      13:28:00 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
      13:28:00 	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
      13:28:00 	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
      13:28:00 	at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
      13:28:00 	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
      13:28:00 	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320)
      13:28:00 	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
      13:28:00 	at org.jvnet.hudson.maven3.launcher.Maven3Launcher.main(Maven3Launcher.java:117)
      13:28:00 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      13:28:00 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      13:28:00 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      13:28:00 	at java.lang.reflect.Method.invoke(Method.java:606)
      13:28:00 	at org.codehaus.plexus.classworlds.launcher.Launcher.launchStandard(Launcher.java:329)
      13:28:00 	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:239)
      13:28:00 	at org.jvnet.hudson.maven3.agent.Maven3Main.launch(Maven3Main.java:181)
      13:28:00 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      13:28:00 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      13:28:00 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      13:28:00 	at java.lang.reflect.Method.invoke(Method.java:606)
      13:28:00 	at hudson.maven.Maven3Builder.call(Maven3Builder.java:136)
      13:28:00 	at hudson.maven.Maven3Builder.call(Maven3Builder.java:71)
      13:28:00 	at hudson.remoting.UserRequest.perform(UserRequest.java:121)
      13:28:00 	at hudson.remoting.UserRequest.perform(UserRequest.java:49)
      13:28:00 	at hudson.remoting.Request$2.run(Request.java:326)
      13:28:00 	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      13:28:00 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      13:28:00 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      13:28:00 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      13:28:00 	at java.lang.Thread.run(Thread.java:745)
      13:28:00 Caused by: org.apache.maven.plugin.MojoFailureException: ExecutionException
      13:28:00 	at org.apache.maven.plugin.surefire.SurefirePlugin.assertNoException(SurefirePlugin.java:262)
      13:28:00 	at org.apache.maven.plugin.surefire.SurefirePlugin.handleSummary(SurefirePlugin.java:252)
      13:28:00 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:854)
      13:28:00 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:722)
      13:28:00 	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
      13:28:00 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
      13:28:00 	... 30 more
      13:28:00 Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException
      13:28:00 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:343)
      13:28:00 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:178)
      13:28:00 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:990)
      13:28:00 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:824)
      13:28:00 	... 33 more
      13:28:00 Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "pClientWrapperTest)"
      13:28:00 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
      13:28:00 	at java.util.concurrent.FutureTask.get(FutureTask.java:188)
      13:28:00 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:327)
      13:28:00 	... 36 more
      13:28:00 Caused by: java.lang.NumberFormatException: For input string: "pClientWrapperTest)"
      13:28:00 	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
      13:28:00 	at java.lang.Integer.parseInt(Integer.java:492)
      13:28:00 	at java.lang.Integer.parseInt(Integer.java:527)
      13:28:00 	at org.apache.maven.plugin.surefire.runorder.RunEntryStatistics.fromString(RunEntryStatistics.java:83)
      13:28:00 	at org.apache.maven.plugin.surefire.runorder.RunEntryStatisticsMap.fromReader(RunEntryStatisticsMap.java:91)
      13:28:00 	at org.apache.maven.plugin.surefire.runorder.RunEntryStatisticsMap.fromFile(RunEntryStatisticsMap.java:66)
      13:28:00 	at org.apache.maven.plugin.surefire.runorder.StatisticsReporter.<init>(StatisticsReporter.java:40)
      13:28:00 	at org.apache.maven.plugin.surefire.StartupReportConfiguration.instantiateStatisticsReporter(StartupReportConfiguration.java:214)
      13:28:00 	at org.apache.maven.plugin.surefire.report.DefaultReporterFactory.<init>(DefaultReporterFactory.java:71)
      13:28:00 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:309)
      13:28:00 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:306)
      13:28:00 	... 4 more
      13:28:00 [ERROR] 
      13:28:00 [ERROR] Re-run Maven using the -X switch to enable full debug logging.
      13:28:00 [ERROR] 
      

      The details of the backtrace aren't always identical, and it's not even always a NumberFormatException, but that's the most common symptom.

      Here's my diagnosis: in the above example, it's trying to parse the string pClientWrapperTest as a number. And, in fact, we have a test called SipClientWrapperTest in the project, so it seems like the .jenkins-HASH file is either getting written to with invalid data or is getting overwritten while we are in the middle of writing.

      Here's our configuration:

            <plugin>
              <groupId>org.apache.maven.plugins</groupId>
              <artifactId>maven-surefire-plugin</artifactId>
              <version>2.18.1</version>
              <configuration>
                <argLine>-Djava.awt.headless=true -Xmx1024m -XX:MaxPermSize=256m -XX:+CMSClassUnloadingEnabled -Dsumo.detect.flakiness=false</argLine>
                <redirectTestOutputToFile>true</redirectTestOutputToFile>
                <runOrder>failedfirst</runOrder>
                <trimStackTrace>false</trimStackTrace>
                <forkCount>1C</forkCount>
                <reuseForks>false</reuseForks>
                <systemPropertyVariables>
                  <property>
                    <name>java.io.tmpdir</name>
                    <value>${project.build.directory}</value>
                  </property>
                </systemPropertyVariables>
                <includes>
                  <include>**/*Spec.class</include>
                  <include>**/*Suite.class</include>
                  <include>**/*Test.class</include>
                </includes>
              </configuration>
            </plugin>
      

      Note the forkCount property: we're running tests in parallel (assuming we're on a multicore machine, which we are), but they all write to the same .jenkins-HASH file because the hash depends on the overall job, not on the individual test classes.

      I haven't looked at the code closely enough to know if multiple tests are writing to the file at the same time, though I haven't been able to catch that in the act, so my guess is that they aren't. But, if you look at the stacktrace, you'll see that it's going through this method:

          public StatisticsReporter instantiateStatisticsReporter()
          {
              if ( requiresRunHistory )
              {
                  final File target = getStatisticsFile();
                  return new StatisticsReporter( target );
              }
              return null;
          }
      

      and specifically the problem seems to be when it instantiates the StatisticsReporter at the start of running tests for one class. Note that if you go up a couple of stack frames, you see this:

                  while ( suites.hasNext() )
                  {
                      final Object testSet = suites.next();
                      Callable<RunResult> pf = new Callable<RunResult>()
                      {
                          public RunResult call()
                              throws Exception
                          {
                              DefaultReporterFactory forkedReporterFactory =
                                  new DefaultReporterFactory( startupReportConfiguration );
                              defaultReporterFactoryList.add( forkedReporterFactory );
                              ForkClient forkClient =
                                              new ForkClient( forkedReporterFactory,
                                                              startupReportConfiguration.getTestVmSystemProperties() );
                              return fork( testSet, new PropertiesWrapper( providerConfiguration.getProviderProperties() ),
                                           forkClient, effectiveSystemProperties, null );
                          }
                      };
                      results.add( executorService.submit( pf ) );
      
                  }
      

      where line 309 is the DefaultReporterFactory instantiation - so we're creating a bunch of Callables that are each instantiating a DefaultReporterFactory, and they're each trying to read the .jenkins-HASH file; given that, I suspect that one of them is trying to read the file while another one is trying to write it, and so it gets a parse error in the middle of reading.

      It seems like the fix in my situation is to remove the runOrder value from my configuration - the code only reads the statistics on startup when runOrder is set to failedfirst or balanced, so if I remove that line (setting it to filesystem) then we won't instantiate a StatisticsReporter at startup. I'm not too familiar with that configuration, but it's not at all clear to me that the failedfirst / balanced configurations work at all in parallel scenarios? (We set the runOrder years ago, while we only turned on the parallel test execution relatively recently.)

      I'm filing this as Minor for now because I suspect I have a workaround, but hopefully this bug will be useful if other people run into the same issue.

        Activity

        Hide
        tibor17 Tibor Digana added a comment -

        David Carlton
        Pls check it out with the latest version 2.19.1.

        Show
        tibor17 Tibor Digana added a comment - David Carlton Pls check it out with the latest version 2.19.1.
        Hide
        davidcarltonsumo David Carlton added a comment -

        I am not 100% sure if I've seen it with 2.19.1 - we switched to 2.19.1 a week or so before I filed this bug report, and I didn't notice any failures with this cause after the 2.19.1 switch. But I don't look at every failure in the build system, so I can't guarantee it didn't occur with 2.19.1. (And it's also something that only occurs once or twice a week across our build system, so even if there weren't any failures in that period, one week isn't enough observation time.)

        I haven't seen any occurrences since I removed runOrder from the configuration file, so I'm optimistic that removing that does fix the problem. But it's certainly possible that switching to 2.19.1 had an effect as well.

        Show
        davidcarltonsumo David Carlton added a comment - I am not 100% sure if I've seen it with 2.19.1 - we switched to 2.19.1 a week or so before I filed this bug report, and I didn't notice any failures with this cause after the 2.19.1 switch. But I don't look at every failure in the build system, so I can't guarantee it didn't occur with 2.19.1. (And it's also something that only occurs once or twice a week across our build system, so even if there weren't any failures in that period, one week isn't enough observation time.) I haven't seen any occurrences since I removed runOrder from the configuration file, so I'm optimistic that removing that does fix the problem. But it's certainly possible that switching to 2.19.1 had an effect as well.
        Hide
        tibor17 Tibor Digana added a comment -

        David Carlton
        I will check the parts of code as you mentioned. Version 2.19.x changed a lot so the behavior may differ from 2.18.1.

        Show
        tibor17 Tibor Digana added a comment - David Carlton I will check the parts of code as you mentioned. Version 2.19.x changed a lot so the behavior may differ from 2.18.1.
        Hide
        tibor17 Tibor Digana added a comment -

        commit 175d1d44b629d362e40b650fe8c0a30cb3623774

        Show
        tibor17 Tibor Digana added a comment - commit 175d1d44b629d362e40b650fe8c0a30cb3623774
        Hide
        davidcarltonsumo David Carlton added a comment -

        Thanks!

        Show
        davidcarltonsumo David Carlton added a comment - Thanks!
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build maven-surefire #1620 (See https://builds.apache.org/job/maven-surefire/1620/)
        SUREFIRE-1244 NumberFormatException in parallel test run with runOrder (tibor17: rev 175d1d44b629d362e40b650fe8c0a30cb3623774)

        • (edit) maven-surefire-common/src/main/java/org/apache/maven/plugin/surefire/report/DefaultReporterFactory.java
        • (edit) maven-surefire-common/src/main/java/org/apache/maven/plugin/surefire/StartupReportConfiguration.java
        • (edit) maven-surefire-common/src/main/java/org/apache/maven/plugin/surefire/runorder/StatisticsReporter.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build maven-surefire #1620 (See https://builds.apache.org/job/maven-surefire/1620/ ) SUREFIRE-1244 NumberFormatException in parallel test run with runOrder (tibor17: rev 175d1d44b629d362e40b650fe8c0a30cb3623774) (edit) maven-surefire-common/src/main/java/org/apache/maven/plugin/surefire/report/DefaultReporterFactory.java (edit) maven-surefire-common/src/main/java/org/apache/maven/plugin/surefire/StartupReportConfiguration.java (edit) maven-surefire-common/src/main/java/org/apache/maven/plugin/surefire/runorder/StatisticsReporter.java

          People

          • Assignee:
            tibor17 Tibor Digana
            Reporter:
            davidcarltonsumo David Carlton
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development