Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-3512

Add additional engine parameter for Ruta HtmlConverter to configure linebreak replacement.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.1.0ruta
    • 2.2.0ruta
    • Ruta
    • None

    Description

      When converting an HTML file to plain text with HtmlConverter engine in Ruta, there exists an engine parameter "replaceLinebreaks" of type boolean to decide if text linebreaks should be replaced or not. If set to true, all linebreaks are kept in the document. If set to false, all linebreaks are deleted. Therefore, the last word of a line and the first word of the next line are put together without whitespace in between. It would often be better if a linebreak is replaced by a whitespace. To configure this, another engine parameter that defines the String, the linebreak is replaced with, would be useful.

      Attachments

        1. linebreakReplacementEngineParameter.core_patch
          3 kB
          Philip-Daniel Beck
        2. linebreakReplacementEngineParameter.docbook_patch
          0.9 kB
          Philip-Daniel Beck

        Activity

          People

            pkluegl Peter Klügl
            pdb Philip-Daniel Beck
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: