[MAPREDUCE-5862] Line records longer than 2x split size aren't handled correctly - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.5.0
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

Suppose this split (100-200) is in the middle of a record (90-240):

   0              100            200             300
   |---- split ----|---- curr ----|---- split ----|
                 <------- record ------->
                 90                     240

Currently, the first split would read the entire record, up to offset 240, which is good. But the 2nd split has a bug in producing a phantom record of (200, 240).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

0001-Handle-records-larger-than-2x-split-size.patch
27/Apr/14 01:05
8 kB
bc Wong
0001-Handle-records-larger-than-2x-split-size.patch
27/Apr/14 01:26
10 kB
bc Wong
recordSpanningMultipleSplits.txt.bz2
27/Apr/14 04:46
0.1 kB
bc Wong
0001-Handle-records-larger-than-2x-split-size.1.patch
28/Apr/14 17:59
14 kB
bc Wong
0001-MAPREDUCE-5862.-Line-records-longer-than-2x-split-si.patch
01/May/14 04:54
15 kB
bc Wong

Activity

People

Assignee:: bc Wong

Reporter:: bc Wong

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 27/Apr/14 01:02

Updated:: 03/Sep/14 20:33

Resolved:: 28/May/14 19:54