Details
Description
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:213)
<...>
Caused by: java.lang.IncompatibleClassChangeError: class org.apache.tika.parser.asm.XHTMLClassVisitor has interface org.objectweb.asm.ClassVisitor as super class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.tika.parser.asm.ClassParser.parse(ClassParser.java:51)
at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:98)
at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:103)
Suggested fix in ParseUtil:
Replace
if (maxParseTime!=-1)
parseResult = runParser(parsers[i], content);
else
parseResult = parsers[i].getParse(content);
with
try
catch( Throwable e )
{ LOG.warn( "Parsing " + content.getUrl() + " with " + parsers[i].getClass().getName() + " failed: " + e.getMessage() ) ; parseResult = null ; }Attachments
Attachments
Issue Links
- Is contained by
-
NUTCH-2378 ChildFirst plugin classloader
- Closed
- is related to
-
NUTCH-2316 Library conflict with Parser-Tika Plugin and Lib Folder
- Closed
- links to