[MAPREDUCE-7194] New Method For CombineFile - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.2.0
Fix Version/s: None
Component/s: mrv2
Labels:
None

Flags:

Patch

Description

Rhe CombineFileInputFormat class is responsible for grouping blocks together to form larger splits. The current implementation is very naive. It iterates over the list of available blocks and as long as the current group of blocks is less than the maximum split size, it will keep added blocks. The check for if a split has reached its maximum size happens after each block is added. For example given a certain maximum "M", and two blocks which are both 7/8M, they will be grouped together to create a split which is 14/8M. If M is a large number, this split will be very large and not what the operator would expect.

I'll propose a general clean up and also, enforcing that, unless a files cannot be split, that its splits will not be larger than the configured maximum size. This will provide operators a much more straight-forward way of calculating the expected number of splits.

Attachments

Issue Links

supercedes

MAPREDUCE-7193 Review of CombineFile Code

Patch Available

links to

GitHub Pull Request #618

Activity

People

Assignee:: David Mollitor

Reporter:: David Mollitor

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 18/Mar/19 17:17

Updated:: 08/Aug/19 07:16