Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-433

Tika + Hadoop

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • general
    • None

    Description

      Would be great to have a Tika contrib that took in an HDFS location with "rich" documents on it and an output format (or output processor) and converted the docs to XHTML or Solr or whatever. Seems like it should be pretty straightforward to do on the Hadoop side of things. Only tricky part, I suppose, is the output format and how to make that pluggable.

      Attachments

        Activity

          People

            Unassigned Unassigned
            gsingers Grant Ingersoll
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: