Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2440

DbResource does not accept crawlid

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Auto Closed
    • 2.3, 2.4
    • 2.5
    • REST_api
    • None

    Description

      DbResource is initiating DbReaders with null crawlids. This blocks querying correct table/collection if crawlid is set during fetch.

      For example in mongodb, by default all data is stored in "webpage" collection. Let say you set crawlid as "tech" for fetch, then all data gets stored in "tech_webpage" collection. But during rest call to /db end point, since you cannot specify crawlid, it will query "webpage" collection.

      I am thinking either DBFilter can be changed to read in crawlid, or resource path can include crawlid. I am open to suggestions and then can make PR.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tmzzngl Tulay Muezzinoglu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: