Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-439

Streaming does not work for text data if the records don't fit in a short UTF8 [2^16/3 characters]

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 0.5.0
    • 0.6.0
    • None
    • None

    Description

      The streaming code internally reads the input data into a UTF8 . This causes truncated data to be shipped to the mapper when the input exceeds about 21000 characters, with no notice to the user except possibly in individual tasks' machines' logs, which people would not normally read for apparently successful jobs.

      Attachments

        Issue Links

          Activity

            People

              hairong Hairong Kuang
              dking Dick King
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: