Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1746

OutOfMemoryError in Mappers

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.7
    • Fix Version/s: None
    • Component/s: generator, injector
    • Labels:
      None
    • Environment:

      Nutch running in local mode with 4M+ domains in domain-urlfilter.txt

    • Patch Info:
      Patch Available

      Description

      Initially I found that Generator was throwing OutOfMemoryError exception no matter how much RAM I allocated to JVM. I fixed the problem by moving URLFilters, URLNormalizers and ScoringFilters to top-level class as singletons and re-using them in all Generator mapper instances.

      Then I found the same problem in Injector and applied analogical fix.

      Now it seems that this issue may be common in all Nutch Mapper implementations.

      I was wondering if it would it be possible to integrate this kind of change
      in the upstream code base and potentially update all vulnerable Mapper classes.

        Attachments

        1. Generator.patch
          2 kB
          Greg Padiasek
        2. Injector.patch
          2 kB
          Greg Padiasek
        3. domain-urlfilter-aa
          5.16 MB
          Greg Padiasek
        4. domain-urlfilter-ab
          5.16 MB
          Greg Padiasek
        5. domain-urlfilter-ac
          5.16 MB
          Greg Padiasek
        6. ObjectCache.patch
          1 kB
          Greg Padiasek

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                gregp Greg Padiasek
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: