-
Type:
Bug
-
Status: Closed
-
Priority:
Trivial
-
Resolution: Fixed
-
Affects Version/s: 1.4
-
Fix Version/s: 1.5
-
Component/s: parser
-
Labels:None
-
Environment:
While testing this: http://www.mail-archive.com/user@nutch.apache.org/msg04688.html
While working on:
http://www.mail-archive.com/user@nutch.apache.org/msg04688.html
I found out that the ParserChecker is missing a newline in its report.
E.g., note:
./bin/nutch org.apache.nutch.parse.ParserChecker http://vault.fbi.gov/watergate/watergate-summary-part-01-of-02/view
produces:
fetching: http://vault.fbi.gov/watergate/watergate-summary-part-01-of-02/view parsing: http://vault.fbi.gov/watergate/watergate-summary-part-01-of-02/view contentType: application/xhtml+xml --------- Url --------------- http://vault.fbi.gov/watergate/watergate-summary-part-01-of-02/view--------- ParseData --------- Version: 5 ...snip
Note that there is no space between view and -----.