Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.20.2
-
Linux- Ubuntu 10.04
-
hadoop, mapreduce
Description
Set textinputformat.record.delimiter as "</entity>"
Suppose the input is a text file with the following content
<entity><id>1</id><name>User1</name></entity><entity><id>2</id><name>User2</name></entity><entity><id>3</id><name>User3</name></entity><entity><id>4</id><name>User4</name></entity><entity><id>5</id><name>User5</name></entity>
Mapper was expected to get value as
Value 1 - <entity><id>1</id><name>User1</name>
Value 2 - <entity><id>2</id><name>User2</name>
Value 3 - <entity><id>3</id><name>User3</name>
Value 4 - <entity><id>4</id><name>User4</name>
Value 5 - <entity><id>5</id><name>User5</name>
According to this bug Mapper gets value
Value 1 - entity><id>1</id><name>User1</name>
Value 2 - <entity>id>2</id><name>User2</name>
Value 3 - <entity><id>3id><name>User3</name>
Value 4 - <entity><id>4</id><name>User4name>
Value 5 - <entity><id>5</id><name>User5</name>
The pattern shown above need not occur for value 1,2,3 necessarily. The bug occurs at some random positions in the map input.