Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-1456

The wikipediaXMLSplitter example fails with "heap size" error

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 0.9
    • 0.10.0
    • classic
    • Solaris 11.1 \
      Hadoop 2.3.0 \
      Maven 3.2.1 \
      JDK 1.7.0_07-b10 \

    Description

      1- The XML file is http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
      2- When I run "mahout wikipediaXMLSplitter -d enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64", it stuck at chunk #571 and after 30 minutes it fails to continue with the java heap size error. Previous chunks are created rapidly (10 chunks per second).
      3- Increasing the heap size via "-Xmx4096m" option doesn't work.
      4- No matter what is the configuration, it seems that there is a memory leak that eat all space.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mahmood mahmood
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: