Nutch
  1. Nutch
  2. NUTCH-159

Specify temp/working directory for crawl

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.8
    • Fix Version/s: 0.8
    • Component/s: fetcher, indexer
    • Labels:
      None
    • Environment:

      Linux/Debian

      Description

      I ran a crawl of 100k web pages and got:

      org.apache.nutch.fs.FSError: java.io.IOException: No space left on device
      at org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:149)
      at org.apache.nutch.fs.FileUtil.copyContents(FileUtil.java:65)
      at org.apache.nutch.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:178)
      at org.apache.nutch.fs.NutchFileSystem.rename(NutchFileSystem.java:224)
      at org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:80)
      Caused by: java.io.IOException: No space left on device
      at java.io.FileOutputStream.writeBytes(Native Method)
      at java.io.FileOutputStream.write(FileOutputStream.java:260)
      at org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:147)
      ... 4 more
      Exception in thread "main" java.io.IOException: Job failed!
      at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
      at org.apache.nutch.crawl.Fetcher.fetch(Fetcher.java:335)
      at org.apache.nutch.crawl.Crawl.main(Crawl.java:107)
      byron@db02:/data/nutch$ df -k

      It appears crawl created a /tmp/nutch directory that filled up even though i specified a db directory.

      Need to add a parameter to the command line or make a globaly configurable /tmp (work area) for the nutch instance so that crawls won't fail.

        Activity

        Andrzej Bialecki made changes -
        Field Original Value New Value
        Fix Version/s 0.8 [ 12310224 ]
        Resolution Won't Fix [ 2 ]
        Status Open [ 1 ] Closed [ 6 ]
        byron miller created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            byron miller
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development