[NUTCH-1286] Refactoring/reimplementing crawling API (NutchApp) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: 2.3.1
Component/s: administration gui, REST_api, web gui
Labels:
- gsoc2014

Description

This issue is to track changes we (Mathijs and I) have planned for the API and webapp in Nutchgora. We have a pretty good idea of how we want to be using the crawl API. It may involve some major refactoring or perhaps a side implementation next the current NutchApp functionality. It depends on how much we can reuse the existing components. The bottom line is that there will be a strictly defined Java API that provide everyting related from crawling/indexing to job control. (Listing jobs, tracking progress and aborting jobs being part of it). There will be no server or service for tracking crawling states, all will be persisted one way or the other and queryable from the API. The REST server shall be a very thin layer on top of the Java implementation. A rich web interface will be very easy layer too, once we have a cleanly (but extensive) defined API. But we will start to make to API usable from a simple command-line interface.

More details will be provided later on.. feel free to comment if you have suggestions/questions.

Attachments

Issue Links

relates to

NUTCH-841 Create a Wicket-based Web Application for Nutch

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Ferdy

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Feb/12 14:55

Updated:: 13/Mar/24 14:51

Resolved:: 20/Sep/15 12:52