Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2369

Define a clean Recogniser interface: for objects from binary data; and for text classification

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 1.17, 2.0.0-BETA, 2.1.0
    • None
    • None

    Description

      As described in TIKA-2360 we should refactor the ObjectRecogniser interface. I propose creating:

      1. TextRecogniser (per Thamme Gowda it takes INPUT:text input and OUTPUT:set of metadata key values)
      2. ObjectRecogniser (also per Thamme ObjectRecogniser, VideoLabeller, OCR, Caption - INPUT:raw bytes and OUTPUT:set of metadata key values.)

      We should of course rectify this with Tika-DL and how that folds in.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            chrismattmann Chris A. Mattmann
            chrismattmann Chris A. Mattmann

            Dates

              Created:
              Updated:

              Slack

                Issue deployment