Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2397

Parser to add paragraph line breaks

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.3.1, 1.13
    • 2.4, 1.14
    • parser
    • None

    Description

      (initially reported with patch/pull-request by Vipul Behl, see #190)

      The parser (parse-tika and parse-html) could be improved to add line breaks between paragraphs, instead of writing the whole document into a single line.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              snagel Sebastian Nagel
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: