Description
The Microsoft Project file format (MPP) could fairly easily be better supported by Tika. Gaps to fill are:
- Correct mimetype definition (it's OLE2 based)
- OLE2 detection for MPP
- Common OLE2 metadata extraction
For fuller support (such as text contents), we'd probably want a parser which used MPXJ. However, as MPXJ is LGPL, it'd need to be an external 3rd party parser. (MPXJ is based on top of POI, but it's under a more copyleft license. POI itself doesn't have MPP support)