Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Would be useful to have the possibility to configure FileSystemFetcher for tika-pipes to only process certain files, e.g. based on extension, match on file name/path or similar pattern.
This way it would be possible to point to a specific root folder and only process matching files like certain extensions, names (e.g. for GIS files like shapefiles there is same name with multiple extensions) etc.
Something like:
<properties>
<fetchers>
<fetcher class="org.apache.tika.pipes.fetcher.fs.FileSystemFetcher">
<params>
<name>fsf</name>
<basePath>/my/base/path1</basePath>
<pattern>myshapefilename.*</pattern>
</params>
</fetcher>
</fetchers>
</properties>
Or:
<pattern>*.doc*,*.pdf</pattern>