Details
-
New Feature
-
Status: Closed
-
Minor
-
Resolution: Won't Fix
-
None
-
None
-
None
-
None
Description
Per the discussion in the Nutch-User mailing list, there is a wish for an "Image Search" add-on component that will index images.
Must have:
- retrieve outlinks to image files from fetched pages
- generate thumbnails from images
- thumbnails are stored in the segments as ImageWritable that contains the compressed binary data and some meta data
Should have:
- implemented as hadoop map reduce job
- should be seperate from main Nutch codeline as it breaks general Nutch logic of one url == one index document.
Could have:
- store the original image in the segments
Would like to have:
- search interface for image index
- parameterizable thumbnail generation (width, height, quality)
Attachments
1.
|
sandbox svn folder | Closed | Unassigned |