Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2550

ToTextHandler includes <style/> element content

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.0, 1.20
    • Component/s: None
    • Labels:
      None

      Description

      When using the ToTextHandler to process .java files, the <style/> element content is included, e.g.:

      testFile
      code {
      color: rgb(0,0,0); font-family: monospace; font-size: 12px; white-space: nowrap;
      }
      .java_plain {
      color: rgb(0,0,0);
      }
      .java_keyword {
      color: rgb(0,0,0); font-weight: bold;
      }
      .java_javadoc_tag {
      color: rgb(147,147,147); background-color: rgb(247,247,247); font-style: italic; font-weight: bold;
      }
      h1 {
      font-family: sans-serif; font-size: 16pt; font-weight: bold; color: rgb(0,0,0); background: rgb(210,210,210); border: solid 1px black; padding: 5px; text-align: center;
      }
      .java_type {
      color: rgb(0,44,221);
      }
      .java_literal {
      color: rgb(188,0,0);
      }
      .java_javadoc_comment {
      color: rgb(147,147,147); background-color: rgb(247,247,247); font-style: italic;
      }
      .java_operator {
      color: rgb(0,124,31);
      }
      .java_separator {
      color: rgb(0,33,255);
      }
      .java_comment {
      color: rgb(147,147,147); background-color: rgb(247,247,247);
      }
      
      testFile/*************************************************************************
       *  Compilation:  javac HelloWorld.java
       *  Execution:    java HelloWorld
       *
       *  Prints "Hello, World". By tradition, this is everyone's first program.
       *
       *************************************************************************/
      
      public class HelloWorld {
          public static void main(String[] args) {
              System.out.println("Hello, World");
          }
      
      }
      
      

      Is this what we want as the default behavior?

        Attachments

          Activity

            People

            • Assignee:
              tallison@apache.org Tim Allison
              Reporter:
              tallison@apache.org Tim Allison
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: