Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-748

RTF parser fails to extract the body

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10
    • Fix Version/s: 1.0
    • Component/s: parser
    • Labels:
      None

      Description

      Using tika-app I'm getting the following result of parsing the attached document:

      <?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml">
      <head>
      <meta name="subject" content="tests"/>
      <meta name="Content-Length" content="2235"/>
      <meta name="comment" content="StarWriter"/>
      <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
      <meta name="X-Parsed-By" content="org.apache.tika.parser.rtf.RTFParser"/>
      <meta name="Content-Type" content="application/rtf"/>
      <meta name="resourceName" content="test.rtf"/>
      <title>test rft document</title>
      </head>
      <body/></html>
      

      The expected result would be a non-empty body containing the text "The quick brown fox jumps over the lazy dog
      ".

        Attachments

        1. TIKA-748.patch
          5 kB
          Michael McCandless
        2. test.rtf
          2 kB
          Andrzej Bialecki

          Activity

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              ab Andrzej Bialecki
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: