Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
None
-
None
-
None
-
None
-
Amazon EC2 Extra Large instance (4 cores, 15 GB RAM), Sun Java 6 (1.6.0_10); 1 Master, 4 Slaves (all the same); each Java process takes the argument "-Xmx700m" (2 Java processes per Instance)
Description
The hadoop job has the task of processing 4 directories in HDFS, each with 15 files. This is sample data, a test run, before I go to the needed 5 directories of about 800 documents each. The mapper takes in nearly 200 pages (not files) and throws an OutOfMemory exception. The largest file is 17 MB.
If this problem is something on my end and not truly a bug, I apologize. However, after Googling a bit, I did see many threads of people running out of memory with small data sets.