Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The hadoop BlockLocation interface defines several functions that Impala relies on:
- getNames
- getHosts
- getCachedHosts
- getOffset
- getLength
- getStorageIds
The result of getStorageIds is used to identify individual disks so that Impala can balance load across multiple disks on a node. Ozone returns NULLs for getStorageIds, which does not allow us to accurately identify individual disks.
Ozone should setStorageIds so we can accurately schedule reads to separate disks.
For some reason Impala expects size of getHosts and getStorageIds to match. Presumably indexes should match, so if a block is stored on two different disks on the same host, it should have matching duplicate hosts in getHosts. StorageIds don't make much sense with erasure coding, so they can be omitted there.
Attachments
Issue Links
- fixes
-
IMPALA-11541 Test getStorageIds for block location with Ozone
- Open
- is related to
-
IMPALA-10213 Handle block location for Ozone
- Resolved