[NUTCH-1047] Pluggable indexing backends - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.7
Component/s: indexer
Labels:
- indexing

Patch Info:

Patch Available

Description

One possible feature would be to add a new endpoint for indexing-backends and make the indexing plugable. at the moment we are hardwired to SOLR - which is OK - but as other resources like ElasticSearch are becoming more popular it would be better to handle this as plugins. Not sure about the name of the endpoint though : we already have indexing-plugins (which are about generating fields sent to the backends) and moreover the backends are not necessarily for indexing / searching but could be just an external storage e.g. CouchDB. The term backend on its own would be confusing in 2.0 as this could be pertaining to the storage in GORA. 'indexing-backend' is the best name that came to my mind so far - please suggest better ones.

We should come up with generic map/reduce jobs for indexing, deduplicating and cleaning and maybe add a Nutch extension point there so we can easily hook up indexing, cleaning and deduplicating for various backends.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

NUTCH-1047-1.x-v1.patch
10/Jan/13 16:45
80 kB
Julien Nioche
NUTCH-1047-1.x-v2.patch
11/Jan/13 17:09
68 kB
Julien Nioche
NUTCH-1047-1.x-v3.patch
14/Jan/13 11:30
121 kB
Julien Nioche
NUTCH-1047-1.x-v4.patch
18/Jan/13 16:08
85 kB
Julien Nioche
NUTCH-1047-1.x-v5.patch
16/Feb/13 17:36
86 kB
Julien Nioche
NUTCH-1047-1.x-final.patch
07/Mar/13 11:16
90 kB
Julien Nioche

Issue Links

blocks

NUTCH-1527 Port nutch-elasticsearch-indexer to Nutch

Closed

NUTCH-1528 Port nutch-mongodb-indexer to Nutch

Closed

is depended upon by

NUTCH-1088 Write Solr XML documents

Open

NUTCH-1517 CloudSearch indexer

Closed

is related to

NUTCH-1446 Port NUTCH-1444 to trunk (Indexing should not create temporary files)

Open

NUTCH-1139 Indexer to delete documents

Closed

NUTCH-656 DeleteDuplicates based on crawlDB only

Closed

relates to

NUTCH-1568 port pluggable indexing architecture to 2.x

Closed

(2 is related to, 1 relates to)

Activity

People

Assignee:: Julien Nioche

Reporter:: Julien Nioche

Votes:: 3 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 13/Jul/11 14:26

Updated:: 25/Sep/13 10:29

Resolved:: 07/Mar/13 11:23