Hadoop Common
  1. Hadoop Common
  2. HADOOP-8655

In TextInputFormat, while specifying textinputformat.record.delimiter the character/character sequences in data file similar to starting character/starting character sequence in delimiter were found missing in certain cases in the Map Output

    Details

    • Target Version/s:
    • Tags:
      hadoop, mapreduce

      Description

      Set textinputformat.record.delimiter as "</entity>"

      Suppose the input is a text file with the following content
      <entity><id>1</id><name>User1</name></entity><entity><id>2</id><name>User2</name></entity><entity><id>3</id><name>User3</name></entity><entity><id>4</id><name>User4</name></entity><entity><id>5</id><name>User5</name></entity>

      Mapper was expected to get value as

      Value 1 - <entity><id>1</id><name>User1</name>
      Value 2 - <entity><id>2</id><name>User2</name>
      Value 3 - <entity><id>3</id><name>User3</name>
      Value 4 - <entity><id>4</id><name>User4</name>
      Value 5 - <entity><id>5</id><name>User5</name>

      According to this bug Mapper gets value

      Value 1 - entity><id>1</id><name>User1</name>
      Value 2 - <entity>id>2</id><name>User2</name>
      Value 3 - <entity><id>3id><name>User3</name>
      Value 4 - <entity><id>4</id><name>User4name>
      Value 5 - <entity><id>5</id><name>User5</name>

      The pattern shown above need not occur for value 1,2,3 necessarily. The bug occurs at some random positions in the map input.

      1. HADOOP-8655.patch
        11 kB
        Gelesh
      2. HADOOP-8655.patch
        10 kB
        Gelesh
      3. HADOOP-8655.patch
        10 kB
        Gelesh
      4. HADOOP-8655 (2).patch
        11 kB
        Gelesh
      5. MAPREDUCE-4519.patch
        4 kB
        Meria Joseph

        Activity

        Arun A K created issue -
        Meria Joseph made changes -
        Field Original Value New Value
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hadoop Flags Reviewed [ 10343 ]
        Release Note A few lines of change in LineReader, also incorporaed the MAPREDUCE-4512 patch
        Meria Joseph made changes -
        Attachment MAPREDUCE-4519.patch [ 12539291 ]
        Jason Lowe made changes -
        Project Hadoop Map/Reduce [ 12310941 ] Hadoop Common [ 12310240 ]
        Key MAPREDUCE-4519 HADOOP-8655
        Affects Version/s 0.20.2 [ 12314203 ]
        Affects Version/s 0.20.2 [ 12314205 ]
        Target Version/s 0.20.2 [ 12314205 ] 2.2.0-alpha [ 12322473 ]
        Fix Version/s 0.20.2 [ 12314205 ]
        Jason Lowe made changes -
        Hadoop Flags Reviewed [ 10343 ]
        Release Note A few lines of change in LineReader, also incorporaed the MAPREDUCE-4512 patch
        Component/s util [ 12310740 ]
        Gelesh made changes -
        Attachment HADOOP-8654.patch [ 12541222 ]
        Gelesh made changes -
        Attachment HADOOP-8655.patch [ 12541232 ]
        Gelesh made changes -
        Attachment HADOOP-8655.patch [ 12541726 ]
        Gelesh made changes -
        Attachment HADOOP-8655.patch [ 12541994 ]
        Gelesh made changes -
        Attachment HADOOP-8654.patch [ 12541222 ]
        Gelesh made changes -
        Attachment HADOOP-8655 (2).patch [ 12542076 ]
        Robert Joseph Evans made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 3.0.0 [ 12320357 ]
        Fix Version/s 2.2.0-alpha [ 12322473 ]
        Resolution Fixed [ 1 ]
        Arun C Murthy made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Allen Wittenauer made changes -
        Fix Version/s 3.0.0 [ 12320357 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Arun A K
          • Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 168h
              168h
              Remaining:
              Remaining Estimate - 168h
              168h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development