[NUTCH-1300] Indexer to filter and normalize URL's - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.6
Component/s: indexer
Labels:
None

Patch Info:

Patch Available

Description

Indexers should be able to normalize URL's. This is useful when a new normalizer is applied to the entire CrawlDB. Without it, some or all records in a segment cannot be indexed at all.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

NUTCH-1300-1.5-1.patch
07/Mar/12 10:38
3 kB
Markus Jelsma

Issue Links

is depended upon by

NUTCH-1323 AjaxNormalizer

Closed

is duplicated by

NUTCH-1614 Plugin to exclude URLs matching regex list from indexing - to enable crawl but do not index

Open

Activity

People

Assignee:: Markus Jelsma

Reporter:: Markus Jelsma

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 06/Mar/12 00:42

Updated:: 13/Mar/24 14:51

Resolved:: 17/Jul/13 18:38