Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7664

VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 0.13.1
    • None
    • None
    • None
    • hive

    Description

      In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in VectorizedBatchUtil.addRowToBatchFrom().

      Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like it wasn't optimized for Vectorized processing.

      addRowToBatchFrom is called for every row and for each row and every column in the batch getPrimitiveCategory is called to figure the type of each column, column types are stored in a HashMap, for VectorGroupByOperator columns types won't change between batches, so column types shouldn't be looked up for every row.

      I recommend storing the column type in StructObjectInspector so that other components can leverage this optimization.

      Also addRowToBatchFrom has a case statement for every row and every column used for type casting I recommend encapsulating the type logic in templatized methods.

      Stack Trace	Sample Count	Percentage(%)
      VectorizedBatchUtil.addRowToBatchFrom	86	26.543
         AbstractPrimitiveObjectInspector.getPrimitiveCategory()	34	10.494
         LazyBinaryStructObjectInspector.getStructFieldData	25	7.716
         StandardStructObjectInspector.getStructFieldData	4	1.235
      

      The query used :

      select 
          ss_sold_date_sk
      from
          store_sales
      where
          ss_sold_date between '1998-01-01' and '1998-06-01'
      group by ss_item_sk , ss_customer_sk , ss_sold_date_sk
      having sum(ss_list_price) > 50000000000000;
      

      Attachments

        1. HIVE-7664.1.patch.txt
          13 kB
          Navis Ryu
        2. HIVE-7664.2.patch.txt
          17 kB
          Navis Ryu

        Issue Links

          Activity

            People

              mmccline Matt McCline
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: