Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2244

MapWritable.readFields needs to clear internal hash else instance accumulates entries forever

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.16.0
    • io
    • None

    Description

      A common framework pattern is to get an instance of a Writable, usually by reflection, and then just keep calling readFields to make new 'instances' of the particular Writable.

      For example, the spill-to-disk that is run at the end of a map task gets instances of map output keys and values and then loops over the (sorted) map output calling readFields to make instances to write out to the filesystem (See around line #470 in the spill method).

      If the particular Writable is an instance of MapWritable, currently we get funny results. It has an internal hash map that is created on instantiation. Each time the readFields method is called, the newly deserialized entries are added to the internal map. The map needs to be reset when readFields is called so it doesn't just keep growing ad infinitum.

      Attachments

        1. hadoop-2244.patch
          2 kB
          Michael Stack
        2. 2244-v2.patch
          2 kB
          Michael Stack

        Activity

          People

            stack Michael Stack
            stack Michael Stack
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: