Uploaded image for project: 'Maven Surefire'
  1. Maven Surefire
  2. SUREFIRE-1244

NumberFormatException in parallel test run with runOrder = failedFirst

    XMLWordPrintableJSON

Details

    Description

      Occasionally in our Jenkins build, I get an error in test reporting with a backtrace like this:

      13:28:00 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project service: ExecutionException: java.lang.NumberFormatException: For input string: "pClientWrapperTest)" -> [Help 1]
      13:28:00 org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project service: ExecutionException
      13:28:00 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213)
      13:28:00 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
      13:28:00 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
      13:28:00 	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
      13:28:00 	at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
      13:28:00 	at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
      13:28:00 	at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
      13:28:00 	at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320)
      13:28:00 	at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
      13:28:00 	at org.jvnet.hudson.maven3.launcher.Maven3Launcher.main(Maven3Launcher.java:117)
      13:28:00 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      13:28:00 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      13:28:00 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      13:28:00 	at java.lang.reflect.Method.invoke(Method.java:606)
      13:28:00 	at org.codehaus.plexus.classworlds.launcher.Launcher.launchStandard(Launcher.java:329)
      13:28:00 	at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:239)
      13:28:00 	at org.jvnet.hudson.maven3.agent.Maven3Main.launch(Maven3Main.java:181)
      13:28:00 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      13:28:00 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      13:28:00 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      13:28:00 	at java.lang.reflect.Method.invoke(Method.java:606)
      13:28:00 	at hudson.maven.Maven3Builder.call(Maven3Builder.java:136)
      13:28:00 	at hudson.maven.Maven3Builder.call(Maven3Builder.java:71)
      13:28:00 	at hudson.remoting.UserRequest.perform(UserRequest.java:121)
      13:28:00 	at hudson.remoting.UserRequest.perform(UserRequest.java:49)
      13:28:00 	at hudson.remoting.Request$2.run(Request.java:326)
      13:28:00 	at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
      13:28:00 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
      13:28:00 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      13:28:00 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      13:28:00 	at java.lang.Thread.run(Thread.java:745)
      13:28:00 Caused by: org.apache.maven.plugin.MojoFailureException: ExecutionException
      13:28:00 	at org.apache.maven.plugin.surefire.SurefirePlugin.assertNoException(SurefirePlugin.java:262)
      13:28:00 	at org.apache.maven.plugin.surefire.SurefirePlugin.handleSummary(SurefirePlugin.java:252)
      13:28:00 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:854)
      13:28:00 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:722)
      13:28:00 	at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
      13:28:00 	at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
      13:28:00 	... 30 more
      13:28:00 Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException
      13:28:00 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:343)
      13:28:00 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:178)
      13:28:00 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:990)
      13:28:00 	at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:824)
      13:28:00 	... 33 more
      13:28:00 Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "pClientWrapperTest)"
      13:28:00 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
      13:28:00 	at java.util.concurrent.FutureTask.get(FutureTask.java:188)
      13:28:00 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:327)
      13:28:00 	... 36 more
      13:28:00 Caused by: java.lang.NumberFormatException: For input string: "pClientWrapperTest)"
      13:28:00 	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
      13:28:00 	at java.lang.Integer.parseInt(Integer.java:492)
      13:28:00 	at java.lang.Integer.parseInt(Integer.java:527)
      13:28:00 	at org.apache.maven.plugin.surefire.runorder.RunEntryStatistics.fromString(RunEntryStatistics.java:83)
      13:28:00 	at org.apache.maven.plugin.surefire.runorder.RunEntryStatisticsMap.fromReader(RunEntryStatisticsMap.java:91)
      13:28:00 	at org.apache.maven.plugin.surefire.runorder.RunEntryStatisticsMap.fromFile(RunEntryStatisticsMap.java:66)
      13:28:00 	at org.apache.maven.plugin.surefire.runorder.StatisticsReporter.<init>(StatisticsReporter.java:40)
      13:28:00 	at org.apache.maven.plugin.surefire.StartupReportConfiguration.instantiateStatisticsReporter(StartupReportConfiguration.java:214)
      13:28:00 	at org.apache.maven.plugin.surefire.report.DefaultReporterFactory.<init>(DefaultReporterFactory.java:71)
      13:28:00 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:309)
      13:28:00 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:306)
      13:28:00 	... 4 more
      13:28:00 [ERROR] 
      13:28:00 [ERROR] Re-run Maven using the -X switch to enable full debug logging.
      13:28:00 [ERROR] 
      

      The details of the backtrace aren't always identical, and it's not even always a NumberFormatException, but that's the most common symptom.

      Here's my diagnosis: in the above example, it's trying to parse the string pClientWrapperTest as a number. And, in fact, we have a test called SipClientWrapperTest in the project, so it seems like the .jenkins-HASH file is either getting written to with invalid data or is getting overwritten while we are in the middle of writing.

      Here's our configuration:

            <plugin>
              <groupId>org.apache.maven.plugins</groupId>
              <artifactId>maven-surefire-plugin</artifactId>
              <version>2.18.1</version>
              <configuration>
                <argLine>-Djava.awt.headless=true -Xmx1024m -XX:MaxPermSize=256m -XX:+CMSClassUnloadingEnabled -Dsumo.detect.flakiness=false</argLine>
                <redirectTestOutputToFile>true</redirectTestOutputToFile>
                <runOrder>failedfirst</runOrder>
                <trimStackTrace>false</trimStackTrace>
                <forkCount>1C</forkCount>
                <reuseForks>false</reuseForks>
                <systemPropertyVariables>
                  <property>
                    <name>java.io.tmpdir</name>
                    <value>${project.build.directory}</value>
                  </property>
                </systemPropertyVariables>
                <includes>
                  <include>**/*Spec.class&lt;/include>
                  <include>**/*Suite.class&lt;/include>
                  <include>**/*Test.class&lt;/include>
                </includes>
              </configuration>
            </plugin>
      

      Note the forkCount property: we're running tests in parallel (assuming we're on a multicore machine, which we are), but they all write to the same .jenkins-HASH file because the hash depends on the overall job, not on the individual test classes.

      I haven't looked at the code closely enough to know if multiple tests are writing to the file at the same time, though I haven't been able to catch that in the act, so my guess is that they aren't. But, if you look at the stacktrace, you'll see that it's going through this method:

          public StatisticsReporter instantiateStatisticsReporter()
          {
              if ( requiresRunHistory )
              {
                  final File target = getStatisticsFile();
                  return new StatisticsReporter( target );
              }
              return null;
          }
      

      and specifically the problem seems to be when it instantiates the StatisticsReporter at the start of running tests for one class. Note that if you go up a couple of stack frames, you see this:

                  while ( suites.hasNext() )
                  {
                      final Object testSet = suites.next();
                      Callable<RunResult> pf = new Callable<RunResult>()
                      {
                          public RunResult call()
                              throws Exception
                          {
                              DefaultReporterFactory forkedReporterFactory =
                                  new DefaultReporterFactory( startupReportConfiguration );
                              defaultReporterFactoryList.add( forkedReporterFactory );
                              ForkClient forkClient =
                                              new ForkClient( forkedReporterFactory,
                                                              startupReportConfiguration.getTestVmSystemProperties() );
                              return fork( testSet, new PropertiesWrapper( providerConfiguration.getProviderProperties() ),
                                           forkClient, effectiveSystemProperties, null );
                          }
                      };
                      results.add( executorService.submit( pf ) );
      
                  }
      

      where line 309 is the DefaultReporterFactory instantiation - so we're creating a bunch of Callables that are each instantiating a DefaultReporterFactory, and they're each trying to read the .jenkins-HASH file; given that, I suspect that one of them is trying to read the file while another one is trying to write it, and so it gets a parse error in the middle of reading.

      It seems like the fix in my situation is to remove the runOrder value from my configuration - the code only reads the statistics on startup when runOrder is set to failedfirst or balanced, so if I remove that line (setting it to filesystem) then we won't instantiate a StatisticsReporter at startup. I'm not too familiar with that configuration, but it's not at all clear to me that the failedfirst / balanced configurations work at all in parallel scenarios? (We set the runOrder years ago, while we only turned on the parallel test execution relatively recently.)

      I'm filing this as Minor for now because I suspect I have a workaround, but hopefully this bug will be useful if other people run into the same issue.

      Attachments

        Activity

          People

            tibordigana Tibor Digana
            davidcarltonsumo David Carlton
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: