Type: New Feature
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Per the discussion in the Nutch-User mailing list, there is a wish for an "Image Search" add-on component that will index images.
- retrieve outlinks to image files from fetched pages
- generate thumbnails from images
- thumbnails are stored in the segments as ImageWritable that contains the compressed binary data and some meta data
- implemented as hadoop map reduce job
- should be seperate from main Nutch codeline as it breaks general Nutch logic of one url == one index document.
- store the original image in the segments
Would like to have:
- search interface for image index
- parameterizable thumbnail generation (width, height, quality)