Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
To get LLAP cache data one needs a file ID which is either an HDFS inode ID, or a composite of path, modification time and size. These can be embedded into splits for ORC, cause in particular for the former it's possible to get the IDs as a part of a normal file enumeration that split generation performs anyway.
If they are missing, the IDs need to be obtained for every file on the fragment side.
We should explore adding file IDs to Parquet splits when the cache is enabled.
Attachments
Issue Links
- relates to
-
HIVE-17006 LLAP: Parquet caching v1
- Closed