[TIKA-433] Tika + Hadoop - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: general
Labels:
None

Description

Would be great to have a Tika contrib that took in an HDFS location with "rich" documents on it and an output format (or output processor) and converted the docs to XHTML or Solr or whatever. Seems like it should be pretty straightforward to do on the Hadoop side of things. Only tricky part, I suppose, is the output format and how to make that pluggable.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Grant Ingersoll

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 25/May/10 21:12

Updated:: 07/Oct/11 08:59

Resolved:: 07/Oct/11 08:59