Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Won't Fix
-
0.9
-
Solaris 11.1 \
Hadoop 2.3.0 \
Maven 3.2.1 \
JDK 1.7.0_07-b10 \
Description
1- The XML file is http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
2- When I run "mahout wikipediaXMLSplitter -d enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64", it stuck at chunk #571 and after 30 minutes it fails to continue with the java heap size error. Previous chunks are created rapidly (10 chunks per second).
3- Increasing the heap size via "-Xmx4096m" option doesn't work.
4- No matter what is the configuration, it seems that there is a memory leak that eat all space.