Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1746

OutOfMemoryError in Mappers

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.7
    • None
    • generator, injector
    • None
    • Nutch running in local mode with 4M+ domains in domain-urlfilter.txt

    • Patch Available

    Description

      Initially I found that Generator was throwing OutOfMemoryError exception no matter how much RAM I allocated to JVM. I fixed the problem by moving URLFilters, URLNormalizers and ScoringFilters to top-level class as singletons and re-using them in all Generator mapper instances.

      Then I found the same problem in Injector and applied analogical fix.

      Now it seems that this issue may be common in all Nutch Mapper implementations.

      I was wondering if it would it be possible to integrate this kind of change
      in the upstream code base and potentially update all vulnerable Mapper classes.

      Attachments

        1. ObjectCache.patch
          1 kB
          Greg Padiasek
        2. Injector.patch
          2 kB
          Greg Padiasek
        3. Generator.patch
          2 kB
          Greg Padiasek
        4. domain-urlfilter-ac
          5.16 MB
          Greg Padiasek
        5. domain-urlfilter-ab
          5.16 MB
          Greg Padiasek
        6. domain-urlfilter-aa
          5.16 MB
          Greg Padiasek

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gregp Greg Padiasek
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: