Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1718

redefine http.robots.agent as "additional agent names"

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 1.7, 2.2, 2.2.1
    • Fix Version/s: 2.3, 1.9
    • Component/s: fetcher
    • Labels:
      None

      Description

      The description of property http.robots.agent in nutch-default.xml recommends to add a '*' to the list of agent names. This will cause the same problem as described in NUTCH-1715. The description should be updated. Also regarding "order of precedence" which is dictated since NUTCH-1031 only by ordering of user agents in robots.txt.

      <property>
        <name>http.robots.agents</name>
        <value>*</value>
        <description>The agent strings we'll look for in robots.txt files,
        comma-separated, in decreasing order of precedence. You should
        put the value of http.agent.name as the first agent name, and keep the
        default * at the end of the list. E.g.: BlurflDev,Blurfl,*
        </description>
      </property>
      

        Attachments

        1. NUTCH-1718-trunk.v2.patch
          8 kB
          Sebastian Nagel
        2. NUTCH-1718-trunk.v1.patch
          6 kB
          Tejas Patil
        3. NUTCH-1718-2x.v2.patch
          7 kB
          Sebastian Nagel

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              wastl-nagel Sebastian Nagel
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: