Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-8654

TextInputFormat delimiter bug:- Input Text portion ends with & Delimiter starts with same char/char sequence

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.204.0, 1.0.3, 0.21.0, 2.0.0-alpha
    • Fix Version/s: 2.0.2-alpha
    • Component/s: util
    • Labels:
    • Environment:

      Linux

    • Target Version/s:
    • Tags:
      TextInputFormat record delimiter

      Description

      TextInputFormat delimiter bug scenario , a character sequence of the input text, in which the first character matches with the first character of delimiter, and the remaining input text character sequence matches with the entire delimiter character sequence from the starting position of the delimiter.

      eg delimiter ="record";
      and Text =" record 1:- name = Gelesh e mail = gelesh.hadoop@gmail.com Location Bangalore record 2: name = sdf .. location =Bangalorrecord 3: name .... "

      Here string "=Bangalorrecord 3: " satisfy two conditions
      1) contains the delimiter "record"
      2) The character / character sequence immediately before the delimiter (ie ' r ') matches with first character (or character sequence ) of delimiter. (ie "=Bangalor" ends with and Delimiter starts with same character/char sequence 'r' ),

      Here the delimiter is not encountered by the program resulting in improper value text in map that contains the delimiter

        Attachments

        1. MAPREDUCE-4512.txt
          0.7 kB
          Gelesh
        2. HADOOP-8654.patch
          3 kB
          Jason Lowe

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              gelesh Gelesh
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 1m
                1m
                Remaining:
                Remaining Estimate - 1m
                1m
                Logged:
                Time Spent - Not Specified
                Not Specified