Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Invalid
-
1.0.0
-
None
-
None
-
None
-
JDK1.6 + tomcat 6 + Eclipse3.3 + nutch 1.0
Description
MyHtmlParser getParse return not null,so all Analyzer-(zh|fr) cannot run
public ParseResult getParse(Content content)
{ return ParseResult.createParseResult(content.getUrl(), new ParseStatus(ParseStatus.FAILED, ParseStatus.FAILED_MISSING_CONTENT, "No textual content available").getEmptyParse(conf)); // return null; }========nutch-site.xml=======
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(myHtml|html|text|js)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|language-identifier|analysis-(zh)</value>
<description><![CDATA[
]]> </description>
</property>
==========parse-plugins.xml============
<mimeType name="text/html">
<plugin id="parse-myHtml" />
<plugin id="parse-html" />
</mimeType>
<alias name="parse-myHtml"
extension-id="org.apache.nutch.parse.html.MyHtmlParser" />
===src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java========
public ParseResult getParse(Content content) {
.....
// cannot run the code:
ParseResult filteredParse = this.htmlParseFilters.filter(content, parseResult,
metaTags, root);
.......