Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-4246

tika-pipes FileSystemFetcher configuration option for file name/path pattern selection

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • tika-pipes
    • None

    Description

      Would be useful to have the possibility to configure FileSystemFetcher for tika-pipes to only process certain files, e.g. based on extension, match on file name/path or similar pattern.
       
      This way it would be possible to point to a specific root folder and only process matching files like certain extensions, names (e.g. for GIS files like shapefiles there is same name with multiple extensions) etc.
       
      Something like:
       

      <properties>
        <fetchers>
          <fetcher class="org.apache.tika.pipes.fetcher.fs.FileSystemFetcher">
            <params>
              <name>fsf</name>
              <basePath>/my/base/path1</basePath>
              <pattern>myshapefilename.*</pattern>
            </params>
          </fetcher>
        </fetchers>
      </properties>

       
      Or:
       

              <pattern>*.doc*,*.pdf</pattern>

      Attachments

        Activity

          People

            Unassigned Unassigned
            taatuut Emil Zegers
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: