Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1047

Pluggable indexing backends

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.7
    • indexer
    • Patch Available

    Description

      One possible feature would be to add a new endpoint for indexing-backends and make the indexing plugable. at the moment we are hardwired to SOLR - which is OK - but as other resources like ElasticSearch are becoming more popular it would be better to handle this as plugins. Not sure about the name of the endpoint though : we already have indexing-plugins (which are about generating fields sent to the backends) and moreover the backends are not necessarily for indexing / searching but could be just an external storage e.g. CouchDB. The term backend on its own would be confusing in 2.0 as this could be pertaining to the storage in GORA. 'indexing-backend' is the best name that came to my mind so far - please suggest better ones.

      We should come up with generic map/reduce jobs for indexing, deduplicating and cleaning and maybe add a Nutch extension point there so we can easily hook up indexing, cleaning and deduplicating for various backends.

      Attachments

        1. NUTCH-1047-1.x-v1.patch
          80 kB
          Julien Nioche
        2. NUTCH-1047-1.x-v2.patch
          68 kB
          Julien Nioche
        3. NUTCH-1047-1.x-v3.patch
          121 kB
          Julien Nioche
        4. NUTCH-1047-1.x-v4.patch
          85 kB
          Julien Nioche
        5. NUTCH-1047-1.x-v5.patch
          86 kB
          Julien Nioche
        6. NUTCH-1047-1.x-final.patch
          90 kB
          Julien Nioche

        Issue Links

          Activity

            People

              jnioche Julien Nioche
              jnioche Julien Nioche
              Votes:
              3 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: