Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2374

Tika App -z should extract PDF inline images by default

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • 1.14
    • 1.16
    • cli
    • None

    Description

      As discussed on dev@ - If you use the Tika App with the default config and the -z extract option, it will extract embedded resources, except PDF inline images. This is unexpected for new users, who won't know that they'd need to pass in a custom config with the extractInlineImages PDF parser option set

      If the user passes in an explicit config to the app, we should respect that. However, if they don't pass one in and take the default, the -z option should (but only that one) enable whatever options are needed to make extraction work properly + fully (currently just extractInlineImages)

      If possible/easy, the -z option should print out some info to let affected users know that the default config was tweaked to give extra embedded resources

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            nick Nick Burch

            Dates

              Created:
              Updated:

              Slack

                Issue deployment