Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-676

MapWritable is written inefficiently and confusingly

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.9.0
    • 1.0.0
    • None
    • None

    Description

      The MapWritable implemention in o.a.n.crawl is written confusingly - it maintains its own internal linked list which I think may have a bug somewhere (I'm getting an NPE in certain cases in the code, though it's hard to track down)

      Can anyone comment as to why MapWritable is written the way it is, rather than just using a HashMap or a LinkedHashMap if consistent ordering is important? I imagine that would improve performance.

      What about just using the Hadoop MapWritable? Obviously that would break some backwards compatibility but it may be a good idea at some point to reduce confusion (I didn't realize that Nutch had its own impl until a few minutes ago)

      Attachments

        1. NUTCH-676_v3.patch
          17 kB
          Dogacan Guney
        2. NUTCH-676_v2.patch
          14 kB
          Dogacan Guney
        3. 0001-NUTCH-676-Replace-MapWritable-implementation-with-t.patch
          26 kB
          Todd Lipcon

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dogacan Dogacan Guney
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment