Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
The latest stable build (I checked out revision 1487628) of Sling is using Jackrabbit version 2.4.2 and it uses Tika version 1.0 for extracting metatdata and text for indexing purpose. Jackrabbit v2.4.2 deployed as a separate web application extracts metadata and text from the uploaded documents perfectly fine, but when deployed in Sling (OSGi environment), full text extraction doesn't work.
Updating the Tika dependency to Version 1.2 in Sling resolved the above issue.
Secondly, if the indexes are deleted from the repository and the server is restarted, indexes are not rebuilt for the existing documents. The Tika bundles were not ready by the time Jackrabbit starts to rebuild the indexes during the Sling server start up. Updating the startlevel from 15 to 10 for the Tika bundles helps to resolve the issue.
The changes related to above fixes are in <sling>/launchpad/builder/src/main/bundles/list.xml file.
Currently Tika bundles are at start level 15 as shown below:
<startLevel level="15">
..........
<bundle>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.0</version>
</bundle>
<bundle>
<groupId>org.apache.tika</groupId>
<artifactId>tika-bundle</artifactId>
<version>1.0</version>
</bundle>
..........
</startLevel>
Moved the above bundles to start level 10 and also the version is changed to 1.2
<startLevel level="10">
..........
<bundle>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.2</version>
</bundle>
<bundle>
<groupId>org.apache.tika</groupId>
<artifactId>tika-bundle</artifactId>
<version>1.2</version>
</bundle>
..........
</startLevel>