-
Type:
Task
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 1.25
-
Component/s: None
-
Labels:None
So far, tika-eval has been focused on processing "extracts", that is, the result of Tika or another text extractor. I think it would be useful to add a basic FileProfiler that handles the raw input files only but does not parse them. This is useful as a first step when profiling a directory of files before going through the costly process of parsing.
Without parsing, we can get file length, digest and file type detection.