Uploaded image for project: 'Droids'
  1. Droids
  2. DROIDS-157

Use Tika to a fuller extent

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.2.0
    • None
    • tika
    • None

    Description

      We should be using Tika to a greater extent. New versions of Tika can do some of the things we've wrote our own code for.
      In addition, new content handlers can provide interesting data. The BoilerpipeContentHandler will try to only grab the content that really matters.
      The Metadata class can return all sorts of interesting values without having to parse them out of the document yourself such as the title or robots meta field.

      Attachments

        Activity

          People

            rfrovarp Richard Frovarp
            rfrovarp Richard Frovarp
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: