Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3216

Add FileProfiler to tika-eval

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.25
    • None
    • None

    Description

      So far, tika-eval has been focused on processing "extracts", that is, the result of Tika or another text extractor. I think it would be useful to add a basic FileProfiler that handles the raw input files only but does not parse them. This is useful as a first step when profiling a directory of files before going through the costly process of parsing.

      Without parsing, we can get file length, digest and file type detection.

      Attachments

        Activity

          People

            tallison Tim Allison
            tallison Tim Allison
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: