Details
-
Bug
-
Status: Triage Needed
-
P3
-
Resolution: Unresolved
-
2.28.0
-
None
-
None
Description
When using the DirectRunner in tests I encountered flaky tests due to NullPointerException.
When the test fails the relevant stack trace is like below:
java.lang.NullPointerException at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:877) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.ThreadFactoryBuilder.setThreadFactory(ThreadFactoryBuilder.java:132) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:192) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:322) at google.registry.beam.TestPipelineExtension.run(TestPipelineExtension.java:344) at google.registry.beam.TestPipelineExtension.run(TestPipelineExtension.java:324) at google.registry.beam.spec11.Spec11PipelineTest.testSuccess_saveToGcs(Spec11PipelineTest.java:185)
However the null pointer seems to have come from calling (a repackaged and very innocent looking) guava function to set the ThreadFactory:
ExecutorService metricsPool = Executors.newCachedThreadPool( new ThreadFactoryBuilder() .setThreadFactory(MoreExecutors.platformThreadFactory()) .setDaemon(false) // otherwise you say you want to leak, please don't! .setNameFormat("direct-metrics-counter-committer") .build());
Which in turn is just calling java.util.concurrent.Executors.defaulThreatFactory() when not on App Engine:
@Beta @GwtIncompatible public static ThreadFactory platformThreadFactory() { if (!isAppEngine()) { return Executors.defaultThreadFactory(); } else { ... }
I don't understand how this could possibly have failed. The test seems to be only flaky on our CI servers running Ubuntu and I haven't been able to reproduce it locally, making debugging a challenge.
I'd appreciate any input you may have. Thanks!