Nutch
  1. Nutch
  2. NUTCH-1253

Incompatible neko and xerces versions

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4
    • Fix Version/s: 2.3, 1.8
    • Component/s: None
    • Labels:
      None
    • Environment:

      Ubuntu 10.04

    • Patch Info:
      Patch Available

      Description

      The Nutch 1.4 distribution includes

      • nekohtml-0.9.5.jar (under .../runtime/local/plugins/lib-
        nekohtml)
      • xercesImpl-2.9.1.jar (under .../runtime/local/lib)

      These two JARs appear to be incompatible versions. When the HtmlParser (configured to use neko) is invoked during a local-mode crawl, the parse fails due to an AbstractMethodError. (Note: To see the AbstractMethodError, rebuild the HtmlParser plugin and add a
      catch(Throwable) clause in the getParse method to log the stacktrace.)

      I found that substituting a later, compatible version of nekohtml (1.9.11)
      fixes the problem.

      Curiously, and in support of the above, the nekohtml plugin.xml file in
      Nutch 1.4 contains the following:

      <plugin
      id="lib-nekohtml"
      name="CyberNeko HTML Parser"
      version="1.9.11"
      provider-name="org.cyberneko">

      <runtime>
      <library name="nekohtml-0.9.5.jar">
      <export name="*"/>
      </library>
      </runtime>
      </plugin>

      Note the conflicting version numbers (version tag is "1.9.11" but the
      specified library is "nekohtml-0.9.5.jar").

      Was the 0.9.5 version included by mistake? Was the intention rather to
      include 1.9.11?

      1. NUTCH-1253.patch
        0.9 kB
        Lewis John McGibbney
      2. NUTCH-1253-2.x-eclipse.patch
        0.6 kB
        Talat UYARER
      3. NUTCH-1253-2.x-v2.patch
        3 kB
        Lewis John McGibbney
      4. NUTCH-1253-nutchgora.patch
        0.9 kB
        Lewis John McGibbney
      5. nutch1253parsed.html
        0.5 kB
        Sebastian Nagel
      6. nutch1253test.html
        0.4 kB
        Sebastian Nagel
      7. NUTCH-1253-trunk.patch
        48 kB
        Lewis John McGibbney
      8. NUTCH-1253-trunk.v2.patch
        5 kB
        Sebastian Nagel
      9. TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt
        1 kB
        Lewis John McGibbney
      10. TEST-org.apache.nutch.parse.html.TestDOMContentUtils.txt
        1 kB
        Lewis John McGibbney

        Activity

        Hide
        Ferdy Galema added a comment -

        Hi,

        Looking at the revision history it seems that 3 years ago the library actually WAS updated to 1.9.11, whereafter a few months later is was reverted to 0.9.4 and later on to 0.9.5 but the plugin version remained at 1.9.11. The fact that they bothered to change this version number in the first place is pretty curious in itself, because most plugins simply remain at version 1.0 despite several changes. Not that it matters, but just to indicate that this number has no real purpose. As to nekohtml jar, am not sure why it's still at this specific version, or why it is the preferred setting. Digging up the issues or mailing lists might give you some more info about this. It might be worth looking into tagsoup.

        I do find your AbstractMethodError curious though. Are you sure it's because of nekohtml and xerces? Can you provide a stracktrace?

        Show
        Ferdy Galema added a comment - Hi, Looking at the revision history it seems that 3 years ago the library actually WAS updated to 1.9.11, whereafter a few months later is was reverted to 0.9.4 and later on to 0.9.5 but the plugin version remained at 1.9.11. The fact that they bothered to change this version number in the first place is pretty curious in itself, because most plugins simply remain at version 1.0 despite several changes. Not that it matters, but just to indicate that this number has no real purpose. As to nekohtml jar, am not sure why it's still at this specific version, or why it is the preferred setting. Digging up the issues or mailing lists might give you some more info about this. It might be worth looking into tagsoup. I do find your AbstractMethodError curious though. Are you sure it's because of nekohtml and xerces? Can you provide a stracktrace?
        Hide
        Dennis Spathis added a comment -

        To see the stacktrace, you'll need to do the following:
        1. Add to your log4j.properties the line
        log4j.logger.org.apache.nutch.parse.html=TRACE,cmdstout
        2. Modify the class HtmlParser's getParse method to catch Throwable and to handle it by logging the stacktrace and
        returning an empty parse result. Then rebuild the parse-html plugin and replace the original in your
        Nutch installation.

        Here's what the stacktrace looks like:

        Parsing...
        Caught throwable
        java.lang.AbstractMethodError: org.cyberneko.html.HTMLScanner.getCharacterOffset()I
        at org.apache.xerces.xni.parser.XMLParseException.<init>(Unknown Source)
        at org.cyberneko.html.HTMLConfiguration$ErrorReporter.createException(HTMLConfiguration.java:673)
        at org.cyberneko.html.HTMLConfiguration$ErrorReporter.reportError(HTMLConfiguration.java:662)
        at org.cyberneko.html.HTMLScanner$ContentScanner.scanAttribute(HTMLScanner.java:2468)
        at org.cyberneko.html.HTMLScanner$ContentScanner.scanAttribute(HTMLScanner.java:2424)
        at org.cyberneko.html.HTMLScanner$ContentScanner.scanStartElement(HTMLScanner.java:2328)
        at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1881)
        at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:809)
        at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478)
        at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431)
        at org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164)
        at org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:252)
        at org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:215)
        at org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:147)
        at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:35)
        at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:24)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.lang.Thread.run(Thread.java:662)

        Show
        Dennis Spathis added a comment - To see the stacktrace, you'll need to do the following: 1. Add to your log4j.properties the line log4j.logger.org.apache.nutch.parse.html=TRACE,cmdstout 2. Modify the class HtmlParser's getParse method to catch Throwable and to handle it by logging the stacktrace and returning an empty parse result. Then rebuild the parse-html plugin and replace the original in your Nutch installation. Here's what the stacktrace looks like: Parsing... Caught throwable java.lang.AbstractMethodError: org.cyberneko.html.HTMLScanner.getCharacterOffset()I at org.apache.xerces.xni.parser.XMLParseException.<init>(Unknown Source) at org.cyberneko.html.HTMLConfiguration$ErrorReporter.createException(HTMLConfiguration.java:673) at org.cyberneko.html.HTMLConfiguration$ErrorReporter.reportError(HTMLConfiguration.java:662) at org.cyberneko.html.HTMLScanner$ContentScanner.scanAttribute(HTMLScanner.java:2468) at org.cyberneko.html.HTMLScanner$ContentScanner.scanAttribute(HTMLScanner.java:2424) at org.cyberneko.html.HTMLScanner$ContentScanner.scanStartElement(HTMLScanner.java:2328) at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:1881) at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:809) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:478) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:431) at org.cyberneko.html.parsers.DOMFragmentParser.parse(DOMFragmentParser.java:164) at org.apache.nutch.parse.html.HtmlParser.parseNeko(HtmlParser.java:252) at org.apache.nutch.parse.html.HtmlParser.parse(HtmlParser.java:215) at org.apache.nutch.parse.html.HtmlParser.getParse(HtmlParser.java:147) at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:35) at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:24) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.lang.Thread.run(Thread.java:662)
        Hide
        Lewis John McGibbney added a comment -

        Trivial patches for both trunk and Nutchgora branch. Can you guys please test and get back on this issue. Thanks

        Show
        Lewis John McGibbney added a comment - Trivial patches for both trunk and Nutchgora branch. Can you guys please test and get back on this issue. Thanks
        Hide
        Lewis John McGibbney added a comment -

        Anyone had time to try this one out?

        Show
        Lewis John McGibbney added a comment - Anyone had time to try this one out?
        Hide
        Ferdy Galema added a comment -

        I'll give this one a go..

        Show
        Ferdy Galema added a comment - I'll give this one a go..
        Hide
        Ferdy Galema added a comment -

        It indeed seems broken for trunk. When running it with default options in local mode, every parse simply fails. This is pretty suprising. With the help of Dennis' instructions it indeed becomes more clear what the error is about. Note that nutchgora is not affected. Though at first sight they seem to be using the same library versions.

        I'm amazed that this error has not been noticed earlier. I cannot speak for users/devs that are on 1.x, so I kindly ask if one of them is able to pick this issue up. (Or least provide some insight). My guess is that they either use tagsoup (instead of neko) or parse-tika for html parsing. Then again if that's the case I don't know why the defaults are now the way they are. Because of this I have not yet tested any of your patches, sorry Lewis.

        Show
        Ferdy Galema added a comment - It indeed seems broken for trunk. When running it with default options in local mode, every parse simply fails. This is pretty suprising. With the help of Dennis' instructions it indeed becomes more clear what the error is about. Note that nutchgora is not affected. Though at first sight they seem to be using the same library versions. I'm amazed that this error has not been noticed earlier. I cannot speak for users/devs that are on 1.x, so I kindly ask if one of them is able to pick this issue up. (Or least provide some insight). My guess is that they either use tagsoup (instead of neko) or parse-tika for html parsing. Then again if that's the case I don't know why the defaults are now the way they are. Because of this I have not yet tested any of your patches, sorry Lewis.
        Hide
        Lewis John McGibbney added a comment -

        Hi Ferdy, the patches I attached were identical for branch Nutchgora and trunk. I would have assumed if trunk was incorrect then Nutchgora would have shadowed this behaviour.

        Show
        Lewis John McGibbney added a comment - Hi Ferdy, the patches I attached were identical for branch Nutchgora and trunk. I would have assumed if trunk was incorrect then Nutchgora would have shadowed this behaviour.
        Hide
        Ferdy Galema added a comment -

        Wow this issue keeps getting more and more interesting. I just found out that the exception is CAUSED BY enabling trace logging. That is why it is so confusing. My previous statement about it not affecting nutchgora is not true it seems. It indeed affects both trunk and nutchgora. See the following instructions for reproducing the problem:

        ferdy@ftm:~/workspace/nutchtrunk/runtime/local$ bin/nutch parsechecker "http://www.iana.org/"
        ...
        Version: 5
        Status: success(1,0)
        ...

        Now what happens when I add the following line to log4j.properties. (Note that the comment by Dennis has a type in this line).
        log4j.logger.org.apache.nutch.parse.html=TRACE,cmdstdout

        ferdy@ftm:~/workspace/nutchtrunk/runtime/local$ bin/nutch parsechecker "http://www.iana.org/"
        ...
        Version: 5
        Status: failed(2,200): org.apache.nutch.parse.ParseException: Unable to successfully parse content
        ...

        So this is very obscure. It might be a trace logging statement that triggers the exception. It cannot be something else.

        Show
        Ferdy Galema added a comment - Wow this issue keeps getting more and more interesting. I just found out that the exception is CAUSED BY enabling trace logging. That is why it is so confusing. My previous statement about it not affecting nutchgora is not true it seems. It indeed affects both trunk and nutchgora. See the following instructions for reproducing the problem: ferdy@ftm:~/workspace/nutchtrunk/runtime/local$ bin/nutch parsechecker "http://www.iana.org/" ... Version: 5 Status: success(1,0) ... Now what happens when I add the following line to log4j.properties. (Note that the comment by Dennis has a type in this line). log4j.logger.org.apache.nutch.parse.html=TRACE,cmdstdout ferdy@ftm:~/workspace/nutchtrunk/runtime/local$ bin/nutch parsechecker "http://www.iana.org/" ... Version: 5 Status: failed(2,200): org.apache.nutch.parse.ParseException: Unable to successfully parse content ... So this is very obscure. It might be a trace logging statement that triggers the exception. It cannot be something else.
        Hide
        Tomasz Struczyński added a comment -

        I don't have time for much analysis, but, the cause is probably this feature:

        parser.setFeature("http://cyberneko.org/html/features/report-errors",
                  LOG.isTraceEnabled());
        

        as this is the only place which uses trace setting and is not surrounded by try... catch.

        Anyway, the problem is still unresolved (using gora branch).

        Show
        Tomasz Struczyński added a comment - I don't have time for much analysis, but, the cause is probably this feature: parser.setFeature("http://cyberneko.org/html/features/report-errors", LOG.isTraceEnabled()); as this is the only place which uses trace setting and is not surrounded by try... catch. Anyway, the problem is still unresolved (using gora branch).
        Hide
        Lewis John McGibbney added a comment -

        It seems that progress (towards a solution) has been made [0] for this issue. I am going to add Dennis' suggestions to the patch and debug this locally. I'll write back here in due course.

        [0] http://www.mail-archive.com/user@nutch.apache.org/msg08702.html

        Show
        Lewis John McGibbney added a comment - It seems that progress (towards a solution) has been made [0] for this issue. I am going to add Dennis' suggestions to the patch and debug this locally. I'll write back here in due course. [0] http://www.mail-archive.com/user@nutch.apache.org/msg08702.html
        Hide
        Lewis John McGibbney added a comment -

        Patch for 2.x (same as for 1.X) hopefully.
        Failing tests for TestDOMContentUtils which indicate something is not working quite well. I've had enough today and heading home, head is bursting.

        Show
        Lewis John McGibbney added a comment - Patch for 2.x (same as for 1.X) hopefully. Failing tests for TestDOMContentUtils which indicate something is not working quite well. I've had enough today and heading home, head is bursting.
        Hide
        Lewis John McGibbney added a comment -

        Some user input that the patch fro 2.x seems to resolve the issue described above.
        http://www.mail-archive.com/user%40nutch.apache.org/msg11318.html

        Show
        Lewis John McGibbney added a comment - Some user input that the patch fro 2.x seems to resolve the issue described above. http://www.mail-archive.com/user%40nutch.apache.org/msg11318.html
        Hide
        Lewis John McGibbney added a comment -

        Any objections to commit?
        Further evidence that this patch is working for users who've reported this issue.
        http://s.apache.org/Sb3

        Show
        Lewis John McGibbney added a comment - Any objections to commit? Further evidence that this patch is working for users who've reported this issue. http://s.apache.org/Sb3
        Hide
        Sebastian Nagel added a comment -

        +1 tested with a collection of problematic documents: no regressions
        But why not upgrade to 1.9.19 right now?

        Show
        Sebastian Nagel added a comment - +1 tested with a collection of problematic documents: no regressions But why not upgrade to 1.9.19 right now?
        Hide
        Lewis John McGibbney added a comment -

        I'll post the patches today Sebastian Nagel. Thanks

        Show
        Lewis John McGibbney added a comment - I'll post the patches today Sebastian Nagel . Thanks
        Hide
        Lewis John McGibbney added a comment - - edited

        Actually, I can confirm that this upgrade seems to break tests in TestDOMContentUtils (see attached). It seems that the document fragment contains some markup... which is incorrect. Having sat in Eclipse for ages debugging this, I am now over on the nekohtml user list trying to sort this out. If you guys have any ideas then please chip in.
        I am not sure whether it's the way we use Xerces2, Neko or maybe a bug in DomContentUtils but there is undesired behavior anyway.
        The failed tests are for both 2.x HEAD and trunk

        Show
        Lewis John McGibbney added a comment - - edited Actually, I can confirm that this upgrade seems to break tests in TestDOMContentUtils (see attached). It seems that the document fragment contains some markup... which is incorrect. Having sat in Eclipse for ages debugging this, I am now over on the nekohtml user list trying to sort this out. If you guys have any ideas then please chip in. I am not sure whether it's the way we use Xerces2, Neko or maybe a bug in DomContentUtils but there is undesired behavior anyway. The failed tests are for both 2.x HEAD and trunk
        Hide
        Sebastian Nagel added a comment -

        It's likely a regression in NekoHTML: <a name="bottom"/> encloses erroneously the rest of the document inclusively </body></html> which is interpreted as textual content. See attached document (taken from failed test unit) and output by parse-html using Neko 1.9.15/19.

        Show
        Sebastian Nagel added a comment - It's likely a regression in NekoHTML: <a name="bottom"/> encloses erroneously the rest of the document inclusively </body></html> which is interpreted as textual content. See attached document (taken from failed test unit) and output by parse-html using Neko 1.9.15/19.
        Hide
        Lewis John McGibbney added a comment -

        Patch for trunk.
        The lapse in our testing code was a combination of the closing </a> and the iframe tag which needed to be closed as well.
        I've made the changes to TestDOMContentutils in parse-html and parse-tika, updated all plugin and ivy configuration and also formatted TestDOMContentUtils as per the Nutch code formatting.
        This was a paint o track down but eventually we got there.
        Thanks if anyone can review.

        Show
        Lewis John McGibbney added a comment - Patch for trunk. The lapse in our testing code was a combination of the closing </a> and the iframe tag which needed to be closed as well. I've made the changes to TestDOMContentutils in parse-html and parse-tika, updated all plugin and ivy configuration and also formatted TestDOMContentUtils as per the Nutch code formatting. This was a paint o track down but eventually we got there. Thanks if anyone can review.
        Hide
        Lewis John McGibbney added a comment -

        I would like to commit by tomorrow night if no-one objects. I've been running with this patch for ages and tests now also pass.

        Show
        Lewis John McGibbney added a comment - I would like to commit by tomorrow night if no-one objects. I've been running with this patch for ages and tests now also pass.
        Hide
        Sebastian Nagel added a comment - - edited

        Hi Lewis John McGibbney, the HTML which fails to parse looks not really incorrect: both <a name="..."/> and <iframe src="..."/> are empty XML-style tags (bachelor tags).
        According to Neko's Change History a configuration feature "allow-selfclosing-iframe" was introduced in v1.19.15. If the feature is set to true, the problematic document is parsed successfully.
        Attached patch adds "allow-selfclosing-iframe" for both parse-html (parse plugin and test) and parse-tika (test only). Tests now pass.
        Note: changes related to upgrade of Neko are contained in patch, but debug output must be removed.

        Show
        Sebastian Nagel added a comment - - edited Hi Lewis John McGibbney , the HTML which fails to parse looks not really incorrect: both <a name="..."/> and <iframe src="..."/> are empty XML-style tags (bachelor tags). According to Neko's Change History a configuration feature "allow-selfclosing-iframe" was introduced in v1.19.15. If the feature is set to true, the problematic document is parsed successfully. Attached patch adds "allow-selfclosing-iframe" for both parse-html (parse plugin and test) and parse-tika (test only). Tests now pass. Note: changes related to upgrade of Neko are contained in patch, but debug output must be removed.
        Hide
        Lewis John McGibbney added a comment -

        Sebastian Nagel's most recent patch minus the debug logging:
        Committed @revision 1562448 in trunk
        Committed @revision 1562447 in 2.x HEAD
        Thanks for investigating new neko parser features Sebastian Nagel

        Show
        Lewis John McGibbney added a comment - Sebastian Nagel 's most recent patch minus the debug logging: Committed @revision 1562448 in trunk Committed @revision 1562447 in 2.x HEAD Thanks for investigating new neko parser features Sebastian Nagel
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Nutch-nutchgora #903 (See https://builds.apache.org/job/Nutch-nutchgora/903/)
        NUTCH-1253 Incompatable neko and xerces versions (lewismc: http://svn.apache.org/viewvc/nutch/branches/2.x/?view=rev&rev=1562447)

        • /nutch/branches/2.x/CHANGES.txt
        • /nutch/branches/2.x/src/plugin/lib-nekohtml/ivy.xml
        • /nutch/branches/2.x/src/plugin/lib-nekohtml/plugin.xml
        • /nutch/branches/2.x/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
        • /nutch/branches/2.x/src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestDOMContentUtils.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Nutch-nutchgora #903 (See https://builds.apache.org/job/Nutch-nutchgora/903/ ) NUTCH-1253 Incompatable neko and xerces versions (lewismc: http://svn.apache.org/viewvc/nutch/branches/2.x/?view=rev&rev=1562447 ) /nutch/branches/2.x/CHANGES.txt /nutch/branches/2.x/src/plugin/lib-nekohtml/ivy.xml /nutch/branches/2.x/src/plugin/lib-nekohtml/plugin.xml /nutch/branches/2.x/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java /nutch/branches/2.x/src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestDOMContentUtils.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Nutch-trunk #2511 (See https://builds.apache.org/job/Nutch-trunk/2511/)
        NUTCH-1253 Incompatable versions of neko and xerces (lewismc: http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1562448)

        • /nutch/trunk/CHANGES.txt
        • /nutch/trunk/src/plugin/lib-nekohtml/ivy.xml
        • /nutch/trunk/src/plugin/lib-nekohtml/plugin.xml
        • /nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
        • /nutch/trunk/src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestDOMContentUtils.java
        • /nutch/trunk/src/plugin/parse-tika/src/test/org/apache/nutch/tika/TestDOMContentUtils.java
        Show
        Hudson added a comment - SUCCESS: Integrated in Nutch-trunk #2511 (See https://builds.apache.org/job/Nutch-trunk/2511/ ) NUTCH-1253 Incompatable versions of neko and xerces (lewismc: http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1562448 ) /nutch/trunk/CHANGES.txt /nutch/trunk/src/plugin/lib-nekohtml/ivy.xml /nutch/trunk/src/plugin/lib-nekohtml/plugin.xml /nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java /nutch/trunk/src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestDOMContentUtils.java /nutch/trunk/src/plugin/parse-tika/src/test/org/apache/nutch/tika/TestDOMContentUtils.java
        Hide
        Yasin Kılınç added a comment -

        I checked and tested patch file into 2.x branch. I used ant eclipse target, then I opened via eclipse IDE. The project compile but eclipse shows warning because of, version of nekohtml is old. I want to attach patch file for this problem.

        Show
        Yasin Kılınç added a comment - I checked and tested patch file into 2.x branch. I used ant eclipse target, then I opened via eclipse IDE. The project compile but eclipse shows warning because of, version of nekohtml is old. I want to attach patch file for this problem.
        Hide
        Lewis John McGibbney added a comment -

        The version of nekohtml we are using is

        <dependency org="net.sourceforge.nekohtml" name="nekohtml" rev="1.9.19" conf="*->master"/>

        AFAIK this is most recent.

        Show
        Lewis John McGibbney added a comment - The version of nekohtml we are using is <dependency org="net.sourceforge.nekohtml" name="nekohtml" rev="1.9.19" conf="*->master"/> AFAIK this is most recent.
        Hide
        Yasin Kılınç added a comment -

        Ok. But there is a line in target eclipse NUTCH_HOME/build.xml like this

        <library path="${basedir}/build/plugins/lib-nekohtml/nekohtml-0.9.5.jar"  exported="false" />
        
        Show
        Yasin Kılınç added a comment - Ok. But there is a line in target eclipse NUTCH_HOME/build.xml like this <library path= "${basedir}/build/plugins/lib-nekohtml/nekohtml-0.9.5.jar" exported= " false " />
        Hide
        Talat UYARER added a comment -

        Yasin Kılınç is right. At the present 2.x branch does not work with eclipse. Eclipse says " Missing required library" about neko 0.9.5. I think Lewis John McGibbney forget adding nekohtml dependecy for eclipse target in build.xml. I create a bugfix patch for 2.x.

        Show
        Talat UYARER added a comment - Yasin Kılınç is right. At the present 2.x branch does not work with eclipse. Eclipse says " Missing required library" about neko 0.9.5. I think Lewis John McGibbney forget adding nekohtml dependecy for eclipse target in build.xml. I create a bugfix patch for 2.x.
        Hide
        Lewis John McGibbney added a comment -

        Talat's patch committed @revision 1576414 in 2.x HEAD.
        Thanks guys for highlighting the work still to be done here.

        Show
        Lewis John McGibbney added a comment - Talat's patch committed @revision 1576414 in 2.x HEAD. Thanks guys for highlighting the work still to be done here.
        Hide
        Sebastian Nagel added a comment -

        Also committed patch to trunk r1576422. Thanks!

        Show
        Sebastian Nagel added a comment - Also committed patch to trunk r1576422. Thanks!
        Hide
        Lewis John McGibbney added a comment -

        Thanks Sebastian Nagel. I forgot to add to trunk :|

        Show
        Lewis John McGibbney added a comment - Thanks Sebastian Nagel . I forgot to add to trunk :|
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Nutch-nutchgora #948 (See https://builds.apache.org/job/Nutch-nutchgora/948/)
        NUTCH-1253 Incompatible neko and xerces versions (lewismc: http://svn.apache.org/viewvc/nutch/branches/2.x/?view=rev&rev=1576414)

        • /nutch/branches/2.x/CHANGES.txt
        • /nutch/branches/2.x/build.xml
        Show
        Hudson added a comment - SUCCESS: Integrated in Nutch-nutchgora #948 (See https://builds.apache.org/job/Nutch-nutchgora/948/ ) NUTCH-1253 Incompatible neko and xerces versions (lewismc: http://svn.apache.org/viewvc/nutch/branches/2.x/?view=rev&rev=1576414 ) /nutch/branches/2.x/CHANGES.txt /nutch/branches/2.x/build.xml
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in Nutch-trunk #2560 (See https://builds.apache.org/job/Nutch-trunk/2560/)
        NUTCH-1253 Incompatible neko and xerces versions (snagel: http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1576422)

        • /nutch/trunk/build.xml
        Show
        Hudson added a comment - SUCCESS: Integrated in Nutch-trunk #2560 (See https://builds.apache.org/job/Nutch-trunk/2560/ ) NUTCH-1253 Incompatible neko and xerces versions (snagel: http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1576422 ) /nutch/trunk/build.xml

          People

          • Assignee:
            Lewis John McGibbney
            Reporter:
            Dennis Spathis
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development