[OODT-652] New TikaCmdLineMetExtractor - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.6
Fix Version/s: 0.7
Component/s: metadata container
Labels:
None

Skill Level:
Don't Know (Unsure) - The default level

Description

Often times, we want to ingest a product and have some basic metadata automatically extracted from it without much effort. The Apache Tika project has great features supporting the detection of and extraction of metadata associated with a product to this effect. The purpose of this issue is to integrate these metadata extraction capabilities of Tika, so that OODT can easily leverage and make use of them.

At a minimum, this issue seeks to:

Incorporate and use Tika's 'parse' method to extract metadata automatically
Include the text content (if any) of a document inside a new metadata element dubbed 'content'. This will be useful for lucene and solr based free-text searches

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

OODT-652.rverma.08-27-2013.patch.txt
28/Aug/13 00:00
4 kB
Rishi Verma
extractor-config.properties
28/Aug/13 19:00
0.0 kB
Rishi Verma

Activity

People

Assignee:: Rishi Verma

Reporter:: Rishi Verma

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 27/Aug/13 23:57

Updated:: 30/Aug/13 17:21

Resolved:: 29/Aug/13 17:38