Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
ghx-label-8
Description
Impala creates a file handle per scan range, for queries that read multiple columns per scan range un-necessarily large load is added to the HDFS NameNode which limits scalability on large clusters.
For a given set of scan ranges against a file within a Scan Node a single file handle should be created an reused to avoid excessive RPCs.
Attachments
Issue Links
- is duplicated by
-
IMPALA-6361 File handle cache should be shared across multiple IO threads
- Resolved
- relates to
-
IMPALA-5212 consider switching to pread by default
- Open