Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3827

Word Document extracted mpga file extension instead of bitmap

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • parser
    • None

    Description

      When tried to parser the .doc document it is extracted two mpga files which can't be open to play. We are suspecting they should be bitmap image files. The Tika version we are using is 2.4.1.

      example.DOC

      Attachments

        1. example.DOC
          19 kB
          Tika User
        2. file_1.bmp
          0.2 kB
          Tim Allison
        3. file_2.bmp
          0.1 kB
          Tim Allison
        4. image-2022-08-04-10-52-44-800.png
          9 kB
          Tika User
        5. image-2022-08-04-10-53-48-894.png
          11 kB
          Tika User
        6. Screenshot from 2022-08-04 06-05-09.png
          33 kB
          Tim Allison
        7. image-2022-08-04-15-44-48-396.png
          0.7 kB
          Tika User
        8. image-2022-08-04-15-45-10-892.png
          6 kB
          Tika User
        9. example.zip
          3 kB
          Tika User

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            Vamsi452 Tika User

            Dates

              Created:
              Updated:

              Slack

                Issue deployment