Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
On my local machine, i've noticed lately a lot of sporadic, non reproducible, failures like these...
2> NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=E254A7E69EC7212A -Dtests.slow=true -Dtests.locale=sv -Dtests.timezone=SystemV/CST6 -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [14:34:23.749] ERROR 0.00s J1 | ScriptEngineTest (suite) <<< > Throwable #1: java.lang.AssertionError: The test or suite printed 10984 bytes to stdout and stderr, even though the limit was set to 8192 bytes. Increase the limit with @Limit, ignore it completely with @SuppressSysoutChecks or run with -Dtests.verbose=true > at __randomizedtesting.SeedInfo.seed([E254A7E69EC7212A]:0) > at org.apache.lucene.util.TestRuleLimitSysouts.afterIfSuccessful(TestRuleLimitSysouts.java:212)
Invariably, looking at the logs of test that fail for this reason, i see multiple instances of these WARN msgs...
2> 601361 T3064 oahh.LeaseRenewer.run WARN Failed to renew lease for [DFSClient_NONMAPREDUCE_-253604438_2947] for 92 seconds. Will retry shortly ... java.net.ConnectException: Call From frisbee/127.0.1.1 to localhost:40618 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused 2> at sun.reflect.GeneratedConstructorAccessor268.newInstance(Unknown Source) 2> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ...
...the full stack traces of these exceptions typically being 36 lines long (not counting the supressed "... 17 more" at the end)
doing some basic crunching of the "tests-report.txt" file from a recent run of all "solr-core" tests (that caused the above failure) leads to some pretty damn disconcerting numbers...
hossman@frisbee:~/tmp$ wc -l tests-report.txt_suite-failure-due-to-sysout.txt 1049177 tests-report.txt_suite-failure-due-to-sysout.txt hossman@frisbee:~/tmp$ grep "Suite: org.apache.solr" tests-report.txt_suite-failure-due-to-sysout.txt | wc -l 465 hossman@frisbee:~/tmp$ grep "LeaseRenewer.run WARN Failed to renew lease" tests-report.txt_suite-failure-due-to-sysout.txt | grep http://wiki.apache.org/hadoop/ConnectionRefused | wc -l 1988 hossman@frisbee:~/tmp$ calc 1988 * 36 71568
So running 465 Solr test suites, we got ~2 thousand of these "Failed to renew lease" WARNings. Of the ~1 million total lines of log messages from all tests, ~70 thousand (~7%) are coming from these WARNing mesages – which can evidently be safetly ignored?
Something seems broken here.
Someone who understands this area of the code should either:
- investigate & fix the code/test not to have these lease renewal problems
- tweak our test logging configs to supress these WARN messages