Description
I ran a crawl of 100k web pages and got:
org.apache.nutch.fs.FSError: java.io.IOException: No space left on device
at org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:149)
at org.apache.nutch.fs.FileUtil.copyContents(FileUtil.java:65)
at org.apache.nutch.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:178)
at org.apache.nutch.fs.NutchFileSystem.rename(NutchFileSystem.java:224)
at org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:80)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:260)
at org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:147)
... 4 more
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
at org.apache.nutch.crawl.Fetcher.fetch(Fetcher.java:335)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:107)
byron@db02:/data/nutch$ df -k
It appears crawl created a /tmp/nutch directory that filled up even though i specified a db directory.
Need to add a parameter to the command line or make a globaly configurable /tmp (work area) for the nutch instance so that crawls won't fail.