[NUTCH-1031] Delegate parsing of robots.txt to crawler-commons - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.7, 2.2
Component/s: None
Labels:
- robots.txt

Description

We're about to release the first version of Crawler-Commons http://code.google.com/p/crawler-commons/ which contains a parser for robots.txt files. This parser should also be better than the one we currently have in Nutch. I will delegate this functionality to CC as soon as it is available publicly

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

CC.robots.multiple.agents.patch
20/Jan/13 10:09
3 kB
Tejas Patil
CC.robots.multiple.agents.v2.patch
22/Jan/13 02:36
5 kB
Tejas Patil
NUTCH-1031.v1.patch
07/Jan/13 08:00
30 kB
Tejas Patil
NUTCH-1031-2.x.v1.patch
29/Apr/13 04:03
62 kB
Tejas Patil
NUTCH-1031-trunk.v2.patch
22/Jan/13 03:06
47 kB
Tejas Patil
NUTCH-1031-trunk.v3.patch
06/Mar/13 04:49
55 kB
Tejas Patil
NUTCH-1031-trunk.v4.patch
06/Mar/13 06:46
55 kB
Tejas Patil
NUTCH-1031-trunk.v5.patch
08/Mar/13 20:42
55 kB
Tejas Patil

Issue Links

is duplicated by

NUTCH-1008 Switch to crawler-commons version of robots.txt parsing code

Closed

is related to

NUTCH-1455 RobotRulesParser to match multi-word user-agent names

Closed

Activity

People

Assignee:: Tejas Patil

Reporter:: Julien Nioche

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 06/Jul/11 13:35

Updated:: 22/May/13 03:54

Resolved:: 29/Apr/13 20:29