Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-882

Design a Host table in GORA

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • nutchgora
    • nutchgora
    • None
    • None
    • Patch Available

    Description

      Having a separate GORA table for storing information about hosts (and domains?) would be very useful for :

      • customising the behaviour of the fetching on a host basis e.g. number of threads, min time between threads etc...
      • storing stats
      • keeping metadata and possibly propagate them to the webpages
      • keeping a copy of the robots.txt and possibly use that later to filter the webtable
      • store sitemaps files and update the webtable accordingly

      I'll try to come up with a GORA schema for such a host table but any comments are of course already welcome

      Attachments

        1. NUTCH-882-v1.patch
          20 kB
          Julien Nioche
        2. hostdb.patch
          25 kB
          Dogacan Guney
        3. NUTCH-882-v3.txt
          45 kB
          Ferdy
        4. NUTCH-882-v3.txt
          45 kB
          Ferdy

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jnioche Julien Nioche
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: