[NUTCH-1465] Support sitemaps in Nutch - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.14
Component/s: parser
Labels:
None

Description

I recently came across this rather stagnant codebase[0] which is ASL v2.0 licensed and appears to have been used successfully to parse sitemaps as per the discussion here[1].

[0] http://sourceforge.net/projects/sitemap-parser/
[1] http://lucene.472066.n3.nabble.com/Support-for-Sitemap-Protocol-and-Canonical-URLs-td630060.html

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

NUTCH-1465.patch
05/Jul/17 13:19
27 kB
Markus Jelsma
NUTCH-1465.patch
30/Jun/17 15:56
27 kB
Markus Jelsma
NUTCH-1465.patch
30/Jun/17 15:33
27 kB
Markus Jelsma
NUTCH-1465.patch
30/Jun/17 14:50
27 kB
Markus Jelsma
NUTCH-1465-trunk.v5.patch
28/Jan/14 16:49
21 kB
Tejas Patil
NUTCH-1465-trunk.v4.patch
26/Jan/14 17:08
19 kB
Tejas Patil
NUTCH-1465-trunk.v3.patch
22/Jan/14 13:50
19 kB
Tejas Patil
NUTCH-1465-trunk.v2.patch
21/Jan/14 19:20
16 kB
Tejas Patil
NUTCH-1465-sitemapinjector-trunk-v1.patch
15/Dec/13 23:31
17 kB
Sebastian Nagel
NUTCH-1465-trunk.v1.patch
28/Jan/13 02:32
27 kB
Tejas Patil

Issue Links

is related to

NUTCH-1622 Create Outlinks with metadata

Closed

NUTCH-1741 Support of Sitemaps in Nutch 2.x

Closed

links to

GitHub Pull Request #189

Activity

People

Assignee:: Markus Jelsma

Reporter:: Lewis John McGibbney

Votes:: 1 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 04/Sep/12 13:57

Updated:: 13/Mar/24 14:51

Resolved:: 19/Jul/17 11:25