Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1465

Support sitemaps in Nutch

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.14
    • parser
    • None

    Description

      I recently came across this rather stagnant codebase[0] which is ASL v2.0 licensed and appears to have been used successfully to parse sitemaps as per the discussion here[1].

      [0] http://sourceforge.net/projects/sitemap-parser/
      [1] http://lucene.472066.n3.nabble.com/Support-for-Sitemap-Protocol-and-Canonical-URLs-td630060.html

      Attachments

        1. NUTCH-1465-trunk.v1.patch
          27 kB
          Tejas Patil
        2. NUTCH-1465-sitemapinjector-trunk-v1.patch
          17 kB
          Sebastian Nagel
        3. NUTCH-1465-trunk.v2.patch
          16 kB
          Tejas Patil
        4. NUTCH-1465-trunk.v3.patch
          19 kB
          Tejas Patil
        5. NUTCH-1465-trunk.v4.patch
          19 kB
          Tejas Patil
        6. NUTCH-1465-trunk.v5.patch
          21 kB
          Tejas Patil
        7. NUTCH-1465.patch
          27 kB
          Markus Jelsma
        8. NUTCH-1465.patch
          27 kB
          Markus Jelsma
        9. NUTCH-1465.patch
          27 kB
          Markus Jelsma
        10. NUTCH-1465.patch
          27 kB
          Markus Jelsma

        Issue Links

          Activity

            People

              markus17 Markus Jelsma
              lewismc Lewis John McGibbney
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: