[HIVE-7664] VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 0.13.1
Fix Version/s: None
Component/s: None
Labels:
None

Tags:
hive

Description

In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in VectorizedBatchUtil.addRowToBatchFrom().

Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like it wasn't optimized for Vectorized processing.

addRowToBatchFrom is called for every row and for each row and every column in the batch getPrimitiveCategory is called to figure the type of each column, column types are stored in a HashMap, for VectorGroupByOperator columns types won't change between batches, so column types shouldn't be looked up for every row.

I recommend storing the column type in StructObjectInspector so that other components can leverage this optimization.

Also addRowToBatchFrom has a case statement for every row and every column used for type casting I recommend encapsulating the type logic in templatized methods.

Stack Trace	Sample Count	Percentage(%)
VectorizedBatchUtil.addRowToBatchFrom	86	26.543
   AbstractPrimitiveObjectInspector.getPrimitiveCategory()	34	10.494
   LazyBinaryStructObjectInspector.getStructFieldData	25	7.716
   StandardStructObjectInspector.getStructFieldData	4	1.235

The query used :

select 
    ss_sold_date_sk
from
    store_sales
where
    ss_sold_date between '1998-01-01' and '1998-06-01'
group by ss_item_sk , ss_customer_sk , ss_sold_date_sk
having sum(ss_list_price) > 50000000000000;

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-7664.1.patch.txt
19/Aug/14 09:37
13 kB
Navis Ryu
HIVE-7664.2.patch.txt
20/Aug/14 01:46
17 kB
Navis Ryu

Issue Links

is superceded by

HIVE-9937 LLAP: Vectorized Field-By-Field Serialize / Deserialize to support new Vectorized Map Join

Closed

links to

review board

Activity

People

Assignee:: Matt McCline

Reporter:: Mostafa Mokhtar

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 08/Aug/14 21:25

Updated:: 22/Jul/18 08:38

Resolved:: 22/Jul/18 08:38