Uploaded image for project: 'Apache Trafodion (Retired)'
  1. Apache Trafodion (Retired)
  2. TRAFODION-2965

Hash partial groupby does not report a row count in operator statistics

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0-incubating
    • 2.3
    • sql-exe
    • None
    • any

    Description

      Here is a test case that demonstrates this:

      update statistics for table hive.hive.time_dim on every column;
      control query shape groupby(exchange(groupby(exchange(groupby(scan)))));
      prepare s from
      select count(distinct t_time_id) from hive.hive.time_dim; 
      explain options 'f' s;
      execute s;
      get statistics for qid current default;
      

      The actual statistics show a "0" as the row count (ActRowsUsed) for the lowest EX_HASH_GRBY with id 2:

         LC   RC   Id PaId ExId Frag TDB Name                   DOP   Dispatches        OperCpuTime        EstRowsUsed        ActRowsUsed        ActDataUsed    Details
      
         13    .   14    .    8    0 EX_ROOT                      1            2                 37                  0                  1                  8 3658
         12    .   13   14    7    0 EX_SORT_GRBY                 1            5                 61                  1                  1                  8
         11    .   12   13    6    0 EX_SPLIT_TOP                 1            8                171                  1                  4                 32
         10    .   11   12    6    0 EX_SEND_TOP                  4           16              3,389                  1                  4                 64
          9    .   10   11    6    2 EX_SEND_BOTTOM               4           12                683                  1                  4                 64
          8    .    9   10    6    2 EX_SPLIT_BOTTOM              4           19                597                  1                  4                 32 833874
          7    .    8    9    5    2 EX_SORT_GRBY                 4        2,259            289,200                  1                  4                 32
          6    .    7    8    4    2 EX_HASH_GRBY                 4        2,259            394,235               1451             86,400      2,765,059,200 0|0|0
          5    .    6    7    3    2 EX_SPLIT_TOP                 4        2,211             51,034             110007             86,400      2,765,059,200
          4    .    5    6    3    2 EX_SEND_TOP                  8        2,436             98,125             110007             86,400      2,766,528,000
          3    .    4    5    3    3 EX_SEND_BOTTOM               8       10,658            196,714             110007             86,400      2,766,528,000
          2    .    3    4    3    3 EX_SPLIT_BOTTOM              2        2,663             95,246             110007             86,400      2,765,059,200 1547521
          1    .    2    3    2    3 EX_HASH_GRBY                 2        4,521            303,319            55003.5                  0                  0
          .    .    1    2    1    3 EX_HDFS_SCAN                 2        2,650            952,242             116085             86,400      2,765,059,200 HIVE.HIVE.TIME_DIM|86400|5288324
      

      The reason is that the hash groupby reports its row count in the BMO stats. However, a partial hash groupby is not considered a BMO (Big Memory Operator), so no rowcount gets reported. The fix is to increment the rowcount in the generic stats entry that is present in both partial and full groupby operators.
       

      Attachments

        Issue Links

          Activity

            People

              hzeller Hans Zeller
              hzeller Hans Zeller
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: