Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1209

Output from ParserChecker Url missing a newline

    XMLWordPrintableJSON

    Details

      Description

      While working on:

      http://www.mail-archive.com/user@nutch.apache.org/msg04688.html

      I found out that the ParserChecker is missing a newline in its report.

      E.g., note:

      ./bin/nutch org.apache.nutch.parse.ParserChecker http://vault.fbi.gov/watergate/watergate-summary-part-01-of-02/view
      

      produces:

      fetching: http://vault.fbi.gov/watergate/watergate-summary-part-01-of-02/view
      parsing: http://vault.fbi.gov/watergate/watergate-summary-part-01-of-02/view
      contentType: application/xhtml+xml
      ---------
      Url
      ---------------
      http://vault.fbi.gov/watergate/watergate-summary-part-01-of-02/view---------
      ParseData
      ---------
      Version: 5
      ...snip
      

      Note that there is no space between view and -----.

        Attachments

          Activity

            People

            • Assignee:
              chrismattmann Chris A. Mattmann
              Reporter:
              chrismattmann Chris A. Mattmann
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: