[HDFS-12222] Document and test BlockLocation for erasure-coded files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0-alpha1
Fix Version/s: 3.0.0-beta1
Component/s: None
Labels:
- hdfs-ec-3.0-nice-to-have

Target Version/s:

3.0.0-beta1

Description

HDFS applications query block location information to compute splits. One example of this is FileInputFormat:

https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346

You see bits of code like this that calculate offsets as follows:

    long bytesInThisBlock = blkLocations[startIndex].getOffset() + 
                          blkLocations[startIndex].getLength() - offset;

EC confuses this since the block locations include parity block locations as well, which are not part of the logical file length. This messes up the offset calculation and thus topology/caching information too.

Applications can figure out what's a parity block by reading the EC policy and then parsing the schema, but it'd be a lot better if we exposed this more generically in BlockLocation instead.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-12222.001.patch
17/Aug/17 07:12
9 kB
Huafeng Wang
HDFS-12222.002.patch
22/Aug/17 05:59
4 kB
Huafeng Wang
HDFS-12222.003.patch
05/Sep/17 05:32
13 kB
Huafeng Wang
HDFS-12222.004.patch
07/Sep/17 07:32
31 kB
Huafeng Wang
HDFS-12222.005.patch
08/Sep/17 03:04
21 kB
Huafeng Wang
HDFS-12222.006.patch
12/Sep/17 06:50
23 kB
Huafeng Wang

Activity

People

Assignee:: Huafeng Wang

Reporter:: Andrew Wang

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 28/Jul/17 21:26

Updated:: 13/Sep/17 01:27

Resolved:: 13/Sep/17 00:35