Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-117

Crawl crashes with java.io.IOException: already exists: C:\nutch\crawl.intranet\oct18\db\webdb.new\pagesByURL

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 0.6, 0.7, 0.7.1
    • 0.7.2
    • None
    • None
    • Window 2000 P4 1.70GHz 512MB RAM
      Java 1.5.0_05

    Description

      I started a crawl using the command line using nutch 0.7.1.

      nutch-daemon.sh start crawl urls.txt -dir oct18 -threads 4 -depth 20

      After crawling for over 15 hours the crawl crached with the following exception:

      051019 050543 status: segment 20051019050438, 30 pages, 0 errors, 1589818 bytes, 48020 ms
      051019 050543 status: 0.6247397 pages/s, 258.65167 kb/s, 52993.934 bytes/page
      051019 050544 Updating C:\nutch\crawl.intranet\oct18\db
      051019 050544 Updating for C:\nutch\crawl.intranet\oct18\segments\20051019050438
      051019 050544 Processing document 0
      051019 050544 Finishing update
      051019 050544 Processing pagesByURL: Sorted 47 instructions in 0.02 seconds.
      051019 050544 Processing pagesByURL: Sorted 2350.0 instructions/second
      Exception in thread "main" java.io.IOException: already exists: C:\nutch\crawl.intranet\oct18\db\webdb.new\pagesByURL
      at org.apache.nutch.io.MapFile$Writer.<init>(MapFile.java:86)
      at org.apache.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.java:549)
      at org.apache.nutch.db.WebDBWriter.close(WebDBWriter.java:1544)
      at org.apache.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.java:321)
      at org.apache.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.java:371)
      at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:141)

      This was on the 14th segement from the requested depth of 20. Doing a quick Google on the exception brings up a few previous posts with the same error but no definitive answer, seems to have been occuring since nutch 0.6.

      Attachments

        Activity

          People

            pkosiorowski Piotr Kosiorowski
            scross Stephen Cross
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: