Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3527

Add simple URLFetcher to tika-core

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 2.1.0
    • None
    • None

    Description

      In 1.x, users could send a URL including a file url to tika-server and have tika-server fetch the bytes. In 2.x, we created the tika-pipes modules and included a file fetcher in tika-core and put an http-fetcher in its own module because of its dependency on httpclient.

      To smooth the transition to 2.x, it might be useful to add a URLFetcher that uses the built-in basic Java URL.getConnection() functionality. I'd want to prohibit the file protocol because of the history with that as a vulnerability. If folks want to fetch files, they have to explicitly choose a different fetcher and specify a base path.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tallison Tim Allison
            tallison Tim Allison
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment