Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-416

Out-of-process text extraction

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 0.9
    • parser
    • None

    Description

      There's currently no easy way to guard against JVM crashes or excessive memory or CPU use caused by parsing very large, broken or intentionally malicious input documents. To better protect against such cases and to generally improve the manageability of resource consumption by Tika it would be great if we had a way to run Tika parsers in separate JVM processes. This could be handled either as a separate "Tika parser daemon" or as an explicitly managed pool of forked JVMs.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jukkaz Jukka Zitting
            jukkaz Jukka Zitting
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment