Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: nutchgora
    • Fix Version/s: nutchgora
    • Component/s: None
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Having a separate GORA table for storing information about hosts (and domains?) would be very useful for :

      • customising the behaviour of the fetching on a host basis e.g. number of threads, min time between threads etc...
      • storing stats
      • keeping metadata and possibly propagate them to the webpages
      • keeping a copy of the robots.txt and possibly use that later to filter the webtable
      • store sitemaps files and update the webtable accordingly

      I'll try to come up with a GORA schema for such a host table but any comments are of course already welcome

      1. NUTCH-882-v3.txt
        45 kB
        Ferdy Galema
      2. NUTCH-882-v3.txt
        45 kB
        Ferdy Galema
      3. hostdb.patch
        25 kB
        Doğacan Güney
      4. NUTCH-882-v1.patch
        20 kB
        Julien Nioche

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Unassigned
              Reporter:
              Julien Nioche
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development