Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2369

Define a clean Recogniser interface: for objects from binary data; and for text classification

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 2.0, 1.17
    • Component/s: None
    • Labels:
      None

      Description

      As described in TIKA-2360 we should refactor the ObjectRecogniser interface. I propose creating:

      1. TextRecogniser (per Thamme Gowda it takes INPUT:text input and OUTPUT:set of metadata key values)
      2. ObjectRecogniser (also per Thamme ObjectRecogniser, VideoLabeller, OCR, Caption - INPUT:raw bytes and OUTPUT:set of metadata key values.)

      We should of course rectify this with Tika-DL and how that folds in.

        Attachments

          Activity

            People

            • Assignee:
              chrismattmann Chris A. Mattmann
              Reporter:
              chrismattmann Chris A. Mattmann
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: