This issue is to track changes we (Mathijs and I) have planned for the API and webapp in Nutchgora. We have a pretty good idea of how we want to be using the crawl API. It may involve some major refactoring or perhaps a side implementation next the current NutchApp functionality. It depends on how much we can reuse the existing components. The bottom line is that there will be a strictly defined Java API that provide everyting related from crawling/indexing to job control. (Listing jobs, tracking progress and aborting jobs being part of it). There will be no server or service for tracking crawling states, all will be persisted one way or the other and queryable from the API. The REST server shall be a very thin layer on top of the Java implementation. A rich web interface will be very easy layer too, once we have a cleanly (but extensive) defined API. But we will start to make to API usable from a simple command-line interface.
More details will be provided later on.. feel free to comment if you have suggestions/questions.