Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
When running 1.9-rc1 against govdocs1, I found a few files whose mime-types have changed. I'm posting this now so that others can look...some of these are for the better, and some not.
For further investigation:
- embedded pict and wmf are now sometimes identified as pdf (
TIKA-1085) - several .doc files are now identified as application/x-msmetafile and no text is being extracted
- several .doc files are now identified as jpeg or png and no text is being extracted
- several .ppt files which were being identified as various (jpeg, ppt, png, msoffice, word) are now being detected as excel
Probably for the good:
- a handful of files that were identified as text are now identified as pdf (
TIKA-1085)