Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-148

org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • 0.8
    • None
    • indexer
    • None
    • Windows XP Home

    Description

      I get the following error while running org.apache.nutch.tools.CrawlTool

      The error actually is in deleteduplicates

      51223 001121 Reading url hashes...
      051223 001121 Sorting url hashes...
      051223 001121 Deleting url duplicates...
      051223 001121 Error moving bad file
      G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF
      \classes\ddup-workingdir\ddup-20051223001121: java.io.IOException:
      CreateProcess
      : df -k G:\apache-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 error=2
      It throws the error here in NFSDataInputStream.java
      The exception is org.apache.nutch.fs.ChecksumException: Checksum
      error: G:\apach
      e-tomcat-5.5.12\webapps\crux\WEB-INF\classes\ddup-workingdir\ddup-20051223001121 at 0

      Attachments

        Activity

          People

            Unassigned Unassigned
            rrprabhu raghavendra prabhu
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: