Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-748

RTF parser fails to extract the body

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.10
    • 1.0
    • parser
    • None

    Description

      Using tika-app I'm getting the following result of parsing the attached document:

      <?xml version="1.0" encoding="UTF-8"?><html xmlns="http://www.w3.org/1999/xhtml">
      <head>
      <meta name="subject" content="tests"/>
      <meta name="Content-Length" content="2235"/>
      <meta name="comment" content="StarWriter"/>
      <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
      <meta name="X-Parsed-By" content="org.apache.tika.parser.rtf.RTFParser"/>
      <meta name="Content-Type" content="application/rtf"/>
      <meta name="resourceName" content="test.rtf"/>
      <title>test rft document</title>
      </head>
      <body/></html>
      

      The expected result would be a non-empty body containing the text "The quick brown fox jumps over the lazy dog
      ".

      Attachments

        1. test.rtf
          2 kB
          Andrzej Bialecki
        2. TIKA-748.patch
          5 kB
          Michael McCandless

        Activity

          People

            mikemccand Michael McCandless
            ab Andrzej Bialecki
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: