[MAPREDUCE-5862] Line records longer than 2x split size aren't handled correctly - ASF JIRA

Log work

Agile Board

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.5.0
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed

Description

Suppose this split (100-200) is in the middle of a record (90-240):

   0              100            200             300
   |---- split ----|---- curr ----|---- split ----|
                 <------- record ------->
                 90                     240

Currently, the first split would read the entire record, up to offset 240, which is good. But the 2nd split has a bug in producing a phantom record of (200, 240).

Attachments

0001-Handle-records-larger-than-2x-split-size.1.patch
28/Apr/14 17:59
14 kB
bc Wong
0001-Handle-records-larger-than-2x-split-size.patch
27/Apr/14 01:26
10 kB
bc Wong
0001-Handle-records-larger-than-2x-split-size.patch
27/Apr/14 01:05
8 kB
bc Wong
0001-MAPREDUCE-5862.-Line-records-longer-than-2x-split-si.patch
01/May/14 04:54
15 kB
bc Wong
recordSpanningMultipleSplits.txt.bz2
27/Apr/14 04:46
0.1 kB
bc Wong

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: bc Wong Assign to me

Reporter:: bc Wong

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 27/Apr/14 01:02

Updated:: 03/Sep/14 20:33

Resolved:: 28/May/14 19:54

Agile

View on Board

Line records longer than 2x split size aren't handled correctly

Details

Description

Attachments

Attachments

Activity

People

Dates

Agile

Slack

Issue deployment