Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Invalid
-
0.8
-
None
-
None
-
None
Description
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg06698.html
I am looking into fixing some very weird behavior of the file protocol.
I am using 0.8.
Researching this topic I found
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg06536.html
and
http://www.folge2.de/tp/search/1/crawling-the-local-filesystem-with-nutch
I am on Ubuntu but I have the same problem that nutch is going down the
tree (including parents) and not up (including children from the root
url).
Further I would vote to make the fetch-parents optional and defined per
a property whether I would like this not very intuitive "feature".
Attachments
Attachments
Issue Links
- is part of
-
NUTCH-905 Configurable file protocol parent directory crawling
- Closed