Uploaded image for project: 'OODT (Retired)'
  1. OODT (Retired)
  2. OODT-630

Upgrade OODT components from using Tika 0.8 to Tika 1.6

    XMLWordPrintableJSON

Details

    • Expert (Hard) - Guru knowledge of this project could be required

    Description

      Currently, OODT makes use of Tika v0.8 (tika-core) for mime-detection purposes. This version is quite out-of-date, and is incompatible with the use of a tika-core or tika-app v1.3 JAR.

      Tika v1.3 contains numerous upgrades since 0.8 (see [1]), some of which include improved metadata generation for common files. These improved features are extremely useful for metadata gathering.

      If a project using OODT needs features provided with the v1.3 tika-core or tika-app JAR (e.g. custom met extractor), currently they cannot use this version when interacting with OODT server-side components like filemgr, crawler etc. since it is incompatible with OODT's use of v0.8.

      One of the incompatibilities is the deprecation of the 'getMimeType' method within org.apache.tika.mime.MimeTypes.getMimeType(URL). This has been supplemented with Tika.detect(URL.getPath()) & MimeTypes.getRegisteredMimeType(String)

      See example exception thrown below. when crawler 0.6-SNAPSHOT was invoked while a 'tika-app-1.3.jar' was placed in the crawler's lib directory:

      Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.crawl.ProductCrawler ingest
      INFO: ProductCrawler: Ready to ingest product: [/data/staging/IMG_2590.jpg]: ProductType: [GenericFile]
      Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.ingest.StdIngester setFileManager
      INFO: StdIngester: connected to file manager: http://localhost:9000
      Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer setFileManagerUrl
      INFO: In Place Data Transfer to: http://localhost:9000 enabled
      Exception in thread "main" java.lang.NoSuchMethodError: org.apache.tika.mime.MimeTypes.getMimeType(Ljava/net/URL;)Lorg/apache/tika/mime/MimeType;
      at org.apache.oodt.cas.filemgr.structs.Reference.<init>(Reference.java:115)
      at org.apache.oodt.cas.filemgr.versioning.VersioningUtils.addRefsFromUris(VersioningUtils.java:251)
      at org.apache.oodt.cas.filemgr.ingest.StdIngester.ingest(StdIngester.java:189)
      at org.apache.oodt.cas.crawl.ProductCrawler.ingest(ProductCrawler.java:304)
      at org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:188)
      at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:108)
      at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)
      at org.apache.oodt.cas.crawl.daemon.CrawlDaemon.startCrawling(CrawlDaemon.java:82)
      at org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:55)
      at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
      at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)
      at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)

      This JIRA issue is seeks to document efforts to upgrade OODT's use of tika from 0.8 to 1.3.


      [1] http://www.apache.org/dist/tika/CHANGES-1.3.txt

      Attachments

        1. OODT-630.Palsulich.101014.v4.patch
          532 kB
          Tyler Bui-Palsulich
        2. OODT-630.Palsulich.101014.v3.patch
          532 kB
          Tyler Bui-Palsulich
        3. OODT-630.Palsulich.101014.patch
          527 kB
          Tyler Bui-Palsulich

        Issue Links

          Activity

            People

              tpalsulich Tyler Bui-Palsulich
              riverma Rishi Verma
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: