Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3527

Add simple URLFetcher to tika-core

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.1.0
    • None
    • None

    Description

      In 1.x, users could send a URL including a file url to tika-server and have tika-server fetch the bytes. In 2.x, we created the tika-pipes modules and included a file fetcher in tika-core and put an http-fetcher in its own module because of its dependency on httpclient.

      To smooth the transition to 2.x, it might be useful to add a URLFetcher that uses the built-in basic Java URL.getConnection() functionality. I'd want to prohibit the file protocol because of the history with that as a vulnerability. If folks want to fetch files, they have to explicitly choose a different fetcher and specify a base path.

      Attachments

        Issue Links

          Activity

            People

              tallison Tim Allison
              tallison Tim Allison
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: