Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1718

redefine http.robots.agent as "additional agent names"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • 1.7, 2.2, 2.2.1
    • 2.3, 1.9
    • fetcher
    • None

    Description

      The description of property http.robots.agent in nutch-default.xml recommends to add a '*' to the list of agent names. This will cause the same problem as described in NUTCH-1715. The description should be updated. Also regarding "order of precedence" which is dictated since NUTCH-1031 only by ordering of user agents in robots.txt.

      <property>
        <name>http.robots.agents</name>
        <value>*</value>
        <description>The agent strings we'll look for in robots.txt files,
        comma-separated, in decreasing order of precedence. You should
        put the value of http.agent.name as the first agent name, and keep the
        default * at the end of the list. E.g.: BlurflDev,Blurfl,*
        </description>
      </property>
      

      Attachments

        1. NUTCH-1718-trunk.v2.patch
          8 kB
          Sebastian Nagel
        2. NUTCH-1718-trunk.v1.patch
          6 kB
          Tejas Patil
        3. NUTCH-1718-2x.v2.patch
          7 kB
          Sebastian Nagel

        Activity

          People

            Unassigned Unassigned
            snagel Sebastian Nagel
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: