Tika
  1. Tika
  2. TIKA-748

RTF parser fails to extract the body

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10
    • Fix Version/s: 1.0
    • Component/s: parser
    • Labels:
      None

      Description

      Using tika-app I'm getting the following result of parsing the attached document:

      <?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml">
      <head>
      <meta name="subject" content="tests"/>
      <meta name="Content-Length" content="2235"/>
      <meta name="comment" content="StarWriter"/>
      <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
      <meta name="X-Parsed-By" content="org.apache.tika.parser.rtf.RTFParser"/>
      <meta name="Content-Type" content="application/rtf"/>
      <meta name="resourceName" content="test.rtf"/>
      <title>test rft document</title>
      </head>
      <body/></html>
      

      The expected result would be a non-empty body containing the text "The quick brown fox jumps over the lazy dog
      ".

      1. test.rtf
        2 kB
        Andrzej Bialecki
      2. TIKA-748.patch
        5 kB
        Michael McCandless

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Andrzej Bialecki
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development