Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-3512

Add additional engine parameter for Ruta HtmlConverter to configure linebreak replacement.

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.0ruta
    • Fix Version/s: 2.2.0ruta
    • Component/s: Ruta
    • Labels:
      None

      Description

      When converting an HTML file to plain text with HtmlConverter engine in Ruta, there exists an engine parameter "replaceLinebreaks" of type boolean to decide if text linebreaks should be replaced or not. If set to true, all linebreaks are kept in the document. If set to false, all linebreaks are deleted. Therefore, the last word of a line and the first word of the next line are put together without whitespace in between. It would often be better if a linebreak is replaced by a whitespace. To configure this, another engine parameter that defines the String, the linebreak is replaced with, would be useful.

        Attachments

          Activity

            People

            • Assignee:
              pkluegl Peter Klügl
              Reporter:
              pdb Philip-Daniel Beck
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: