Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-3034

Overhaul the legacy Nutch plugin framework and replace it with PF4J

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • pf4j, plugin

    Description

      Motivation

      Plugins provide a large part of the functionality of Nutch. Although the legacy plugin framework continues to offer lots of value i.e.,

      1. [some aspects e.g. examples, are [fairly well documented|https://cwiki.apache.org/confluence/display/NUTCH/PluginCentral]
      2. it is generally stable, and
      3. offers reasonable test coverage (on a plugin-by-plugin basis)
      4. … probably loads more positives which I am overlooking...

      … there are also several aspects which could be improved

      1. the [core framework is sparsely documented|https://cwiki.apache.org/confluence/display/NUTCH/WhichTechnicalConceptsAreBehindTheNutchPluginSystem], this extends to very important aspects like the plugin lifecycleclassloading, packaging, thread safety, and lots of other topics which are of intrinsic value to developers and maintainers. 
      2. the core framework is somewhat [sparsely tested|https://github.com/apache/nutch/blob/master/src/test/org/apache/nutch/plugin/TestPluginSystem.java]… currently 7 tests as of writing. Traditionally, developers have focused on providing unit tests on the plugin-level as opposed to the legacy plugin framework.
      3. see’s very low maintenance/attention. It is my gut feeling (and I may be totally wrong here) but I think that not many people know much about the core legacy plugin framework.
      4. writing plugins is clunky. This largely has to do with the legacy Ant + Ivy build and dependency management system, but that being said, it is clunky non-the-less.
      5. generally speaking, any reduction of code in the Nutch codebase through careful selection and dependence of well maintained, well tested 3rd party libraries would be a good thing for the Nutch codebase.

      This issue therefore proposes to overhaul the legacy Nutch plugin framework and replace it with Plugin Framework for Java (PF4J).

      Task Breakdown

      The following is a proposed breakdown of this overall initiative intp Epics. These Epics should likely be decomposed further but that will be left down to the implementer(s).

      1. document the legacy Nutch plugin lifecycle; taking inspiration from [PF4J’s plugin lifecycle documentaiton|https://pf4j.org/doc/plugin-lifecycle.html] provide both documentation and a diagram which clearly outline how the legacy plugin lifecycle works. Might also be a good idea to make a contribution to PF4J and provide them with a diagram to accompany their documentation . Generally speaking just familiarize ones-self with the legacy plugin framework and understand where the gaps are.
      2. study PF4J framework and perform feasibility study; this will provide an opportunity to identify gaps between what the legacy plugin framework does (and what Nutch) needs Vs what PF4J provides. Touch base with the PF4J community, describe the intention to replace the legacy Nutch plugin framework with PF4J. Obtain guidance on how to proceed. Document this all in the Nutch wiki. Create mapping of [legacy Classes|https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/plugin] to [PF4J equivalents|https://github.com/pf4j/pf4j/tree/master/pf4j/src/main/java/org/pf4j].
      3. Restructure the legacy Nutch plugin package: https://github.com/apache/nutch/tree/master/src/java/org/apache/nutch/plugin
      4. Restructure each plugin in the plugins directory: https://github.com/apache/nutch/tree/master/src/plugin
      5. Update Nutch plugin documentation 
      6. Create/propose plugin utility toolings: #4 in the motivation section states that developing plugins in clunky. A utility tool which streamlines the creation of new plugins would be ideal. For example, this could take the form of a [new bash script|https://github.com/apache/nutch/tree/master/src/bin] which prompts the developer for input and then generates the plugin skeleton. This is a nice to have.

      Google Summer of Code Details

      This initiative is being proposed as a GSoC 2024 project. 

      Proposed Mentor: lewismc 

      Proposed Co-Mentor:

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            lewismc Lewis John McGibbney
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: