Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2931 Improvements to 1.x REST API
  3. NUTCH-2926

Implement persistent storage for Nutch Webserver resources

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • nutch server, storage
    • None

    Description

      The Nutch webserver caches resources (seed lists, configuration, jobs, etc.) in-memory. This is not a reliable or resilient solution for users who want a persistent Nutch server service for the enterprise.
      I therefore propose to add a persistence mechanism which will address this problem.
      I intend to use JOOQ as a thin layer on top of JDBC. This will provide flexibility for deploying a wide variety of RDBMS backends.
      h2 is a popular, very lightweight (~2.5 MB) appropriately-licensed solution we could use as the initial backend. I intend to use it in embedded mode with enabled persistence so we'll have data on the disk. This means that if we stop Nutch server we can restart and restore server resources from disk.

      Some resources
      https://www.jooq.org/
      https://h2database.com/html/download.html
      http://www.h2database.com/html/tutorial.html#using_jooq

      Attachments

        Activity

          People

            lewismc Lewis John McGibbney
            lewismc Lewis John McGibbney
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: