Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4303

Look at using String.intern to dedupe some Strings

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.23.3, 2.0.0-alpha
    • None
    • applicationmaster
    • None

    Description

      MAPREDUCE-4301 fixes one issue with too many duplicate strings, but there are other places where it is not as simple to remove the duplicates. In these cases the source of the strings is an incoming RPC call or from parsing and reading in a file. The only real way to dedupe these is to either use String.intern() which if not used properly could result in the permgen space being filled up, or by playing games with our own cache, and trying to do the same sort of thing as String.intern, but in the heap.

      The following are some that I saw lots of duplicate strings that we should look at doing something about.

      TaskAttemptStatusUpdateEvent$TaskAttemptState.stateString
      MapTaskAttemptImpl.diagnostics
      The keys to Counters.groups
      GenericGroup.displayName
      The keys to GenericGroup.counters
      and GenericCounter.displayName

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              revans2 Robert Joseph Evans
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: