Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16341

Tez Task Execution Summary has incorrect input record counts on some operators

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.3.0, 3.0.0
    • Component/s: Tez
    • Labels:
      None

      Description

      Task Execution Summary
      --------------------------------------------------------------------------------------------------------------------------------
        VERTICES  TOTAL_TASKS  FAILED_ATTEMPTS  KILLED_TASKS   DURATION(ms)  CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS  OUTPUT_RECORDS
      --------------------------------------------------------------------------------------------------------------------------------
           Map 1          167                0             0       17640.00     2,109,200       23,068    150,000,004      11,995,136
          Map 11            5                0             0       10559.00        71,960          633      4,023,690         799,900
          Map 13            1                0             0        2244.00         6,090           29             25               3
           Map 3            1                0             0        2849.00         7,080           99             25               3
           Map 5          271                0             0       55834.00    12,934,890      358,376  1,500,000,001   1,500,000,161
           Map 7          241                0             0       91243.00     5,020,860       71,182  1,827,250,341     652,413,443
      Reducer 10            1                0             0        1010.00         1,900            0              4               0
      Reducer 12            1                0             0        3854.00         1,320            0        799,900               1
      Reducer 14            1                0             0        1420.00         3,790           45              3               1
       Reducer 2            1                0             0        9720.00         6,220          122     11,995,136               1
       Reducer 4            1                0             0         810.00         2,100          105              3               1
       Reducer 6            1                0             0       24863.00         3,260            5  1,500,000,161               1
       Reducer 8          412                0             0       88215.00    17,106,440      184,524  2,165,208,640           1,864
       Reducer 9            2                0             0       29752.00         3,980            0          1,864               4
      --------------------------------------------------------------------------------------------------------------------
      

      Seeing this on queries using runtime filtering. Noticed the INPUT_RECORDS look incorrect for the reducers that are responsible for aggregating the min/max/bloomfilter (Reducers 12, 14, 2, 6). For example Reducer 2 shows 12M input records. However looking at the task logs for Reducer 2, there were only 167 input records.

      It looks like Map 1 has 2 different output vertices (Reducer 2 and Reducer 8), but the total output rows for Map 1 (rather than just the rows going to each specific vertex) is being counted in the input rows for both Reducer 2 and Reducer 8.

        Attachments

        1. HIVE-16341.2.patch
          4 kB
          Jason Dere
        2. HIVE-16341.1.patch
          2 kB
          Jason Dere

          Activity

            People

            • Assignee:
              jdere Jason Dere
              Reporter:
              jdere Jason Dere
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: