Details
-
Improvement
-
Status: Resolved
-
Critical
-
Resolution: Resolved
-
None
-
None
Description
Reading required columns only in nested structure schema
Example -
Current state -
Schema - struct<a:int, b:bigint,c:struct<d:int,e:struct<f:int>,g:string>>
Query - select c.e.f from t where c.e.f > 10;
Current state - read entire c struct from the file and then filter because "hive.io.file.readcolumn.ids" is referred due to which all the children column are select to read from the file.
Conf -
_hive.io.file.readcolumn.ids = "2"
hive.io.file.readNestedColumn.paths = "c.e.f"_
Result -
boolean[ ] include = [true,false,false,true,true,true,true,true]
Expected state -
Schema - struct<a:int, b:bigint,c:struct<d:int,e:struct<f:int>,g:string>>
Query - select c.e.f from t where c.e.f > 10;
Expected state - instead of reading entire c struct from the file just read only the f column by referring the " hive.io.file.readNestedColumn.paths".
Conf -
_hive.io.file.readcolumn.ids = "2"
hive.io.file.readNestedColumn.paths = "c.e.f"_
Result -
boolean[ ] include = [true,false,false,true,false,true,true,false]
Attachments
Attachments
Issue Links
- is related to
-
HIVE-12898 Hive should support ORC block skipping on nested fields
- Open
- links to