Hadoop Common
  1. Hadoop Common
  2. HADOOP-499

Avoid the use of Strings to improve the performance of hadoop streaming

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5.0
    • Fix Version/s: 0.6.0
    • Component/s: None
    • Labels:
      None

      Description

      In hadoop streaming, a record is represented as a String for I/O and is encoded as UTF8 for map/reduce. A record has to be converted between String and UTF8 back and forth multiple times and this wastes CPU time.

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          5d 43m 1 Doug Cutting 05/Sep/06 22:52
          Resolved Resolved Closed Closed
          2d 23h 27m 1 Doug Cutting 08/Sep/06 22:20
          Owen O'Malley made changes -
          Component/s contrib/streaming [ 12310972 ]
          Doug Cutting made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Doug Cutting made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Doug Cutting added a comment -

          I just committed this. Thanks, Hairong!

          Show
          Doug Cutting added a comment - I just committed this. Thanks, Hairong!
          Hairong Kuang made changes -
          Attachment text_streaming.patch [ 12340081 ]
          Hide
          Hairong Kuang added a comment -

          This patch includes the following fix:
          1. replace the the use of UTF8 by Text in hadoop-streaming. Therefore, it fixesADOOP-413.
          2. removes the use of stringsby adding simple manipulation of bytes arrays.
          3. fix the stream close order when map/reduce finishes hence avoid truncated records.

          Show
          Hairong Kuang added a comment - This patch includes the following fix: 1. replace the the use of UTF8 by Text in hadoop-streaming. Therefore, it fixesADOOP-413. 2. removes the use of stringsby adding simple manipulation of bytes arrays. 3. fix the stream close order when map/reduce finishes hence avoid truncated records.
          Hairong Kuang made changes -
          Field Original Value New Value
          Link This issue incorporates HADOOP-413 [ HADOOP-413 ]
          Hairong Kuang created issue -

            People

            • Assignee:
              Hairong Kuang
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development