Description
While running a custom Nutch job (code here), many but not all task failed exceeding the the Hadoop task time-out (`mapreduce.task.timeout`) without generating any "heartbeat" (output, counter increments, log messages). Hadoop logs the stacks of all threads of the timed out task. That's the base for the excerpts below.
The job runs a MultithreadedMapper - most of the mapper threads (48 in total) are waiting for the URLStreamHandler in order to construct a java.net.URL object:
"Thread-11" #27 prio=5 os_prio=0 cpu=243.78ms elapsed=647.25s tid=0x00007f3eb5b0f800 nid=0x8e651 waiting for monitor entry [0x00007f3e84ef9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162) - waiting to lock <0x00000006a1bc0630> (a java.lang.String) at org.apache.nutch.plugin.PluginRepository.createURLStreamHandler(PluginRepository.java:597) at org.apache.nutch.plugin.URLStreamHandlerFactory.createURLStreamHandler(URLStreamHandlerFactory.java:95) at java.net.URL.getURLStreamHandler(java.base@11.0.15/URL.java:1432) at java.net.URL.<init>(java.base@11.0.15/URL.java:651) at java.net.URL.<init>(java.base@11.0.15/URL.java:541) at java.net.URL.<init>(java.base@11.0.15/URL.java:488) at org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:179) at org.apache.nutch.net.URLNormalizers.normalize(URLNormalizers.java:318) at org.apache.nutch.crawl.Injector$InjectMapper.filterNormalize(Injector.java:157) at org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.getContent(SitemapInjector.java:670) at org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.process(SitemapInjector.java:439) at org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:325) at org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:145) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:274)
Only a single mapper thread is active:
"Thread-23" #39 prio=5 os_prio=0 cpu=5830.17ms elapsed=647.09s tid=0x00007f3eb5b42800 nid=0x8e661 in Object.wait() [0x00007f3e842ec000] java.lang.Thread.State: RUNNABLE at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@11.0.15/Native Method) at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@11.0.15/NativeConstructorAccessorImpl.java:62) at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@11.0.15/DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(java.base@11.0.15/Constructor.java:490) at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:170) - locked <0x00000006a1bc0630> (a java.lang.String) at org.apache.nutch.plugin.PluginRepository.createURLStreamHandler(PluginRepository.java:597) at org.apache.nutch.plugin.URLStreamHandlerFactory.createURLStreamHandler(URLStreamHandlerFactory.java:95) at java.net.URL.getURLStreamHandler(java.base@11.0.15/URL.java:1432) at java.net.URL.<init>(java.base@11.0.15/URL.java:651) at java.net.URL.<init>(java.base@11.0.15/URL.java:541) at java.net.URL.<init>(java.base@11.0.15/URL.java:488) at org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:179) at org.apache.nutch.net.URLNormalizers.normalize(URLNormalizers.java:318) at org.apache.nutch.crawl.Injector$InjectMapper.filterNormalize(Injector.java:157) at org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.getContent(SitemapInjector.java:670) at org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.process(SitemapInjector.java:439) at org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:325) at org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:145) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:274)
The stack of one thread shows a nested lock:
"Thread-26" #42 prio=5 os_prio=0 cpu=24.93ms elapsed=647.07s tid=0x00007f3eb5b49800 nid=0x8e664 waiting for monitor entry [0x00007f3e83fe7000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162) - waiting to lock <0x00000006a1bc0630> (a java.lang.String) at org.apache.nutch.plugin.PluginRepository.createURLStreamHandler(PluginRepository.java:597) at org.apache.nutch.plugin.URLStreamHandlerFactory.createURLStreamHandler(URLStreamHandlerFactory.java:95) at java.net.URL.getURLStreamHandler(java.base@11.0.15/URL.java:1432) at java.net.URL.<init>(java.base@11.0.15/URL.java:651) at java.net.URL.<init>(java.base@11.0.15/URL.java:541) at java.net.URL.<init>(java.base@11.0.15/URL.java:488) at javax.crypto.JceSecurity.<clinit>(java.base@11.0.15/JceSecurity.java:239) at javax.crypto.Cipher.getInstance(java.base@11.0.15/Cipher.java:540) at sun.security.ssl.JsseJce.getCipher(java.base@11.0.15/JsseJce.java:190) at sun.security.ssl.SSLCipher.isTransformationAvailable(java.base@11.0.15/SSLCipher.java:509) at sun.security.ssl.SSLCipher.<init>(java.base@11.0.15/SSLCipher.java:498) at sun.security.ssl.SSLCipher.<clinit>(java.base@11.0.15/SSLCipher.java:81) at sun.security.ssl.CipherSuite.<clinit>(java.base@11.0.15/CipherSuite.java:65) at sun.security.ssl.SSLContextImpl.getApplicableSupportedCipherSuites(java.base@11.0.15/SSLContextImpl.java:348) at sun.security.ssl.SSLContextImpl$AbstractTLSContext.<clinit>(java.base@11.0.15/SSLContextImpl.java:580) at java.lang.Class.forName0(java.base@11.0.15/Native Method) at java.lang.Class.forName(java.base@11.0.15/Class.java:315) at java.security.Provider$Service.getImplClass(java.base@11.0.15/Provider.java:1918) at java.security.Provider$Service.newInstance(java.base@11.0.15/Provider.java:1894) at sun.security.jca.GetInstance.getInstance(java.base@11.0.15/GetInstance.java:236) at sun.security.jca.GetInstance.getInstance(java.base@11.0.15/GetInstance.java:164) at javax.net.ssl.SSLContext.getInstance(java.base@11.0.15/SSLContext.java:168) at org.apache.nutch.protocol.okhttp.OkHttp.<clinit>(OkHttp.java:97) at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@11.0.15/Native Method) at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@11.0.15/NativeConstructorAccessorImpl.java:62) at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@11.0.15/DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(java.base@11.0.15/Constructor.java:490) at org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:170) - locked <0x00000006a1bc0410> (a java.lang.String) at org.apache.nutch.plugin.PluginRepository.createURLStreamHandler(PluginRepository.java:597) at org.apache.nutch.plugin.URLStreamHandlerFactory.createURLStreamHandler(URLStreamHandlerFactory.java:95) at java.net.URL.getURLStreamHandler(java.base@11.0.15/URL.java:1432) at java.net.URL.<init>(java.base@11.0.15/URL.java:651) at java.net.URL.<init>(java.base@11.0.15/URL.java:541) at java.net.URL.<init>(java.base@11.0.15/URL.java:488) at org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer.normalize(BasicURLNormalizer.java:179) at org.apache.nutch.net.URLNormalizers.normalize(URLNormalizers.java:318) at org.apache.nutch.crawl.Injector$InjectMapper.filterNormalize(Injector.java:157) at org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.getContent(SitemapInjector.java:670) at org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper$SitemapProcessor.process(SitemapInjector.java:439) at org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:325) at org.apache.nutch.crawl.SitemapInjector$SitemapInjectMapper.map(SitemapInjector.java:145) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper$MapRunner.run(MultithreadedMapper.java:274)
It isn't a dead-lock because the lock is on two different String objects in different lines of code. However, a recursive barrier may make the performance impact of locks much worse. The stack also shows that Nutch's URLStreamHandlerFactory is also called from Java-internal methods. The cryptographic classes in the stack could also give a hint what's going wrong in NUTCH-2936.
I'm not yet clear what the solution could be:
- Extension.getExtensionInstance() with nested synchronization is for sure not suitable to be called frequently
- maybe caching the URL stream handlers solves the problem
- but we should keep also
NUTCH-2936on the radar when looking for solutions
Attachments
Issue Links
- is caused by
-
NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers
- Closed
- is fixed by
-
NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode if protocol-okhttp is used
- Closed