Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-5147

RUTA leaves the contents of STYLE tags in plaintext

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.3.0ruta
    • Fix Version/s: 2.6.0ruta
    • Component/s: Ruta
    • Labels:
      None

      Description

      I'm using RUTA HtmlAnnotator and HtmlConverter to turn an HTML document into the plain text extracted from it, with annotations to represent the markup that were in the original HTML.

      The contents of <STYLE> tags are showing up in the plaintext view, which isn't helpful. As STYLE isn't part of the document contents, I think it'd be better for this not to be added to plaintext, or at least for there to be an option to allow this to be excluded.

      (Apologies if I've missed a way to do this using the existing options)

      As an example of a simple recreate, a document like this can be used:

      <html><head>
          <style>
              /*  */
              .test {
                  text-align: left;
              }
          </style>
      </head><body>Hello world</body></html>
      

        Attachments

          Activity

            People

            • Assignee:
              pkluegl Peter Klügl
              Reporter:
              dalelane Dale Lane
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: