Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2262

Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types

    XMLWordPrintableJSON

Details

    Description

      Background:

      Image captions are a small piece of text, usually of one line, added to the metadata of images to provide a brief summary of the scenery in the image.
      It is a challenging and interesting problem in the domain of computer vision. Tika already has a support for image recognition via Object Recognition Parser, TIKA-1993 which uses an InceptionV3 model pre-trained on ImageNet dataset using tensorflow.
      Captioning an image is a very useful feature since it helps text based Information Retrieval(IR) systems to "understand" the scenery in images.

      Technical details and references:

      • Google has long back open sourced their 'show and tell' neural network and its model for autogenerating captions. Source Code, Research blog
      • Integrate it the same way as the ObjectRecognitionParser
        • Create a RESTful API Service similar to this
        • Extend or enhance ObjectRecognitionParser or one of its implementation

      {skills, learning, homework} for GSoC students

      • Knowledge of languages: java AND python, and maven build system
      • RESTful APIs
      • tensorflow/keras,
      • deeplearning

      Alternatively, a little more harder path for experienced:
      Import keras/tensorflow model to deeplearning4j and run them natively inside JVM.

      Benefits

      • no RESTful integration required. thus no external dependencies
      • easy to distribute on hadoop/spark clusters

      Hurdles:

      • This is a work in progress feature on deeplearning4j and hence expected to have lots of troubles on the way!

      Attachments

        Activity

          People

            chrismattmann Chris A. Mattmann
            thammegowda Thamme Gowda
            Votes:
            2 Vote for this issue
            Watchers:
            24 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: