Uploaded image for project: 'OODT'
  1. OODT
  2. OODT-630

Upgrade OODT components from using Tika 0.8 to Tika 1.6

    XMLWordPrintableJSON

    Details

    • Skill Level:
      Expert (Hard) - Guru knowledge of this project could be required

      Description

      Currently, OODT makes use of Tika v0.8 (tika-core) for mime-detection purposes. This version is quite out-of-date, and is incompatible with the use of a tika-core or tika-app v1.3 JAR.

      Tika v1.3 contains numerous upgrades since 0.8 (see [1]), some of which include improved metadata generation for common files. These improved features are extremely useful for metadata gathering.

      If a project using OODT needs features provided with the v1.3 tika-core or tika-app JAR (e.g. custom met extractor), currently they cannot use this version when interacting with OODT server-side components like filemgr, crawler etc. since it is incompatible with OODT's use of v0.8.

      One of the incompatibilities is the deprecation of the 'getMimeType' method within org.apache.tika.mime.MimeTypes.getMimeType(URL). This has been supplemented with Tika.detect(URL.getPath()) & MimeTypes.getRegisteredMimeType(String)

      See example exception thrown below. when crawler 0.6-SNAPSHOT was invoked while a 'tika-app-1.3.jar' was placed in the crawler's lib directory:

      Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.crawl.ProductCrawler ingest
      INFO: ProductCrawler: Ready to ingest product: [/data/staging/IMG_2590.jpg]: ProductType: [GenericFile]
      Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.ingest.StdIngester setFileManager
      INFO: StdIngester: connected to file manager: http://localhost:9000
      Jun 18, 2013 3:40:07 PM org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferer setFileManagerUrl
      INFO: In Place Data Transfer to: http://localhost:9000 enabled
      Exception in thread "main" java.lang.NoSuchMethodError: org.apache.tika.mime.MimeTypes.getMimeType(Ljava/net/URL;)Lorg/apache/tika/mime/MimeType;
      at org.apache.oodt.cas.filemgr.structs.Reference.<init>(Reference.java:115)
      at org.apache.oodt.cas.filemgr.versioning.VersioningUtils.addRefsFromUris(VersioningUtils.java:251)
      at org.apache.oodt.cas.filemgr.ingest.StdIngester.ingest(StdIngester.java:189)
      at org.apache.oodt.cas.crawl.ProductCrawler.ingest(ProductCrawler.java:304)
      at org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:188)
      at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:108)
      at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)
      at org.apache.oodt.cas.crawl.daemon.CrawlDaemon.startCrawling(CrawlDaemon.java:82)
      at org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:55)
      at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
      at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)
      at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)

      This JIRA issue is seeks to document efforts to upgrade OODT's use of tika from 0.8 to 1.3.


      [1] http://www.apache.org/dist/tika/CHANGES-1.3.txt

        Attachments

        1. OODT-630.Palsulich.101014.patch
          527 kB
          Tyler Bui-Palsulich
        2. OODT-630.Palsulich.101014.v3.patch
          532 kB
          Tyler Bui-Palsulich
        3. OODT-630.Palsulich.101014.v4.patch
          532 kB
          Tyler Bui-Palsulich

          Issue Links

            Activity

              People

              • Assignee:
                tpalsulich Tyler Bui-Palsulich
                Reporter:
                riverma Rishi Verma
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: