Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.6
-
Expert (Hard) - Guru knowledge of this project could be required
Description
Currently, OODT makes use of Tika v0.8 (tika-core) for mime-detection purposes. This version is quite out-of-date, and is incompatible with the use of a tika-core or tika-app v1.3 JAR.
Tika v1.3 contains numerous upgrades since 0.8 (see [1]), some of which include improved metadata generation for common files. These improved features are extremely useful for metadata gathering.
If a project using OODT needs features provided with the v1.3 tika-core or tika-app JAR (e.g. custom met extractor), currently they cannot use this version when interacting with OODT server-side components like filemgr, crawler etc. since it is incompatible with OODT's use of v0.8.
One of the incompatibilities is the deprecation of the 'getMimeType' method within org.apache.tika.mime.MimeTypes.getMimeType(URL). This has been supplemented with Tika.detect(URL.getPath()) & MimeTypes.getRegisteredMimeType(String)
See example exception thrown below. when crawler 0.6-SNAPSHOT was invoked while a 'tika-app-1.3.jar' was placed in the crawler's lib directory:
—
Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.crawl.ProductCrawler ingest
INFO: ProductCrawler: Ready to ingest product: [/data/staging/IMG_2590.jpg]: ProductType: [GenericFile]
Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.ingest.StdIngester setFileManager
INFO: StdIngester: connected to file manager: http://localhost:9000
Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer setFileManagerUrl
INFO: In Place Data Transfer to: http://localhost:9000 enabled
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.tika.mime.MimeTypes.getMimeType(Ljava/net/URL;)Lorg/apache/tika/mime/MimeType;
at org.apache.oodt.cas.filemgr.structs.Reference.<init>(Reference.java:115)
at org.apache.oodt.cas.filemgr.versioning.VersioningUtils.addRefsFromUris(VersioningUtils.java:251)
at org.apache.oodt.cas.filemgr.ingest.StdIngester.ingest(StdIngester.java:189)
at org.apache.oodt.cas.crawl.ProductCrawler.ingest(ProductCrawler.java:304)
at org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:188)
at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:108)
at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)
at org.apache.oodt.cas.crawl.daemon.CrawlDaemon.startCrawling(CrawlDaemon.java:82)
at org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:55)
at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)
at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)
—
This JIRA issue is seeks to document efforts to upgrade OODT's use of tika from 0.8 to 1.3.
Attachments
Attachments
Issue Links
- contains
-
OODT-385 Upgrade Tika to version 1.0
- Resolved