Uploaded image for project: 'Sling'
  1. Sling
  2. SLING-2924

Full text extraction issue with Tika v1.0 under OSGi environment

    XMLWordPrintableJSON

    Details

      Description

      The latest stable build (I checked out revision 1487628) of Sling is using Jackrabbit version 2.4.2 and it uses Tika version 1.0 for extracting metatdata and text for indexing purpose. Jackrabbit v2.4.2 deployed as a separate web application extracts metadata and text from the uploaded documents perfectly fine, but when deployed in Sling (OSGi environment), full text extraction doesn't work.

      Updating the Tika dependency to Version 1.2 in Sling resolved the above issue.

      Secondly, if the indexes are deleted from the repository and the server is restarted, indexes are not rebuilt for the existing documents. The Tika bundles were not ready by the time Jackrabbit starts to rebuild the indexes during the Sling server start up. Updating the startlevel from 15 to 10 for the Tika bundles helps to resolve the issue.

      The changes related to above fixes are in <sling>/launchpad/builder/src/main/bundles/list.xml file.

      Currently Tika bundles are at start level 15 as shown below:

      <startLevel level="15">
      ..........
      <bundle>
      <groupId>org.apache.tika</groupId>
      <artifactId>tika-core</artifactId>
      <version>1.0</version>
      </bundle>
      <bundle>
      <groupId>org.apache.tika</groupId>
      <artifactId>tika-bundle</artifactId>
      <version>1.0</version>
      </bundle>
      ..........
      </startLevel>

      Moved the above bundles to start level 10 and also the version is changed to 1.2

      <startLevel level="10">
      ..........
      <bundle>
      <groupId>org.apache.tika</groupId>
      <artifactId>tika-core</artifactId>
      <version>1.2</version>
      </bundle>
      <bundle>
      <groupId>org.apache.tika</groupId>
      <artifactId>tika-bundle</artifactId>
      <version>1.2</version>
      </bundle>
      ..........
      </startLevel>

        Attachments

          Activity

            People

            • Assignee:
              rombert Robert Munteanu
              Reporter:
              anjan Anjan
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: