Uploaded image for project: 'Apache Trafodion (Retired)'
  1. Apache Trafodion (Retired)
  2. TRAFODION-2552

Skew buster plan not chosen when join predicate involves SUBSTRs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.1-incubating, 2.2.0
    • 2.2.0
    • sql-cmp
    • None
    • All

    Description

      Skew-buster is a feature of Trafodion that modifies the partitioning function on joins when skewed values are present. When we notice that there are a few very frequent values among many low frequency values in a join column, we use a different strategy for joining the frequent values than the non-frequent ones.

      The following script reproduces the problem:

      ?section setup

      drop table if exists skewSubstr;

      create table skewSubstr (a int not null, b char(16), primary key (a))
      salt using 4 partitions;

      upsert using load into skewSubstr
      select
      x1 + 10*x2 + 100*x3 + 1000*x4 + 10000*x5 + 100000*x6,
      case when mod(x1 + 10*x2 + 100*x3 + 1000*x4 + 10000*x5 + 100000*x6,97) = 0 then 'askewvalue'
      else char(x5+97) || char(mod(x4 + 3*x2 + 7*x6,26)+97) || char(x4+97) || char(x1+97) ||
      char(mod(2*x1 + 5*x5 + x6,26)+97) || char(mod(x1+x2+x3,26)+97) || char(x6+97) ||
      char(x2+97) || char(x3+97) || char(mod(x1-x2+x4+2*x6,26)+97)
      end
      – the from clause below creates 1,000,000 rows, the cross product of
      – 6 copies of

      { 0, ... 9 }

      from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) T(x1)
      transpose 0,1,2,3,4,5,6,7,8,9 as x2
      transpose 0,1,2,3,4,5,6,7,8,9 as x3
      transpose 0,1,2,3,4,5,6,7,8,9 as x4
      transpose 0,1,2,3,4,5,6,7,8,9 as x5
      transpose 0,1,2,3,4,5,6,7,8,9 as x6;

      update statistics for table skewSubtr on every column;

      create table otherTable(c int not null, d char(16), primary key (c))
      salt using 4 partitions;

      upsert using load into otherTable
      select a, case when b = 'askewvalue' then substr(b,1,10) || char(mod(a,26)+97) else b end
      from skewSubstr;

      update statistics for table otherTable on every column;

      ?section doit

      cqd SKEW_SENSITIVITY_THRESHOLD '0.0';
      cqd SKEW_ROWCOUNT_THRESHOLD '10';

      prepare goodQuery from select count from skewSubstr join otherTable
      on b = d;

      – notice in the explain, that the descriptions on the esp_exchanges are
      – (h2-ud) and (h2-br), which indicates that skew-buster is operative
      explain options 'f' goodQuery;

      prepare badQuery from select count from skewSubstr join otherTable
      on substr(b,1,9) = substr(d,1,9);

      – notice in the explain, that the descriptions on the esp_exchanges are
      – (hash2), which indicates that skew-buster is NOT operative
      explain options 'f' badQuery;

      To see the problem, look at the output of the EXPLAINs:

      >>
      >>cqd SKEW_SENSITIVITY_THRESHOLD '0.0';

      — SQL operation complete.
      >>cqd SKEW_ROWCOUNT_THRESHOLD '10';

      — SQL operation complete.
      >>
      >>prepare goodQuery from select count from skewSubstr join otherTable
      +> on b = d;

      — SQL command prepared.
      >>
      >>-- notice in the explain, that the descriptions on the esp_exchanges are
      >>-- (h2-ud) and (h2-br), which indicates that skew-buster is operative
      >>explain options 'f' goodQuery;

      LC RC OP OPERATOR OPT DESCRIPTION CARD
      ---- ---- ---- -------------------- -------- -------------------- ---------

      8 . 9 root 1.00E+000
      7 . 8 sort_partial_aggr_ro 1.00E+000
      6 . 7 esp_exchange 1:4(h2-ud) 1.00E+000
      5 . 6 sort_partial_aggr_le 1.00E+000
      4 2 5 hybrid_hash_join 1.01E+006
      3 . 4 esp_exchange 4(h2-ud):4(hash2) 1.00E+006
      . . 3 trafodion_scan SKEWSUBSTR 1.00E+006
      1 . 2 esp_exchange 4(h2-br):4(hash2) 1.00E+006
      . . 1 trafodion_scan OTHERTABLE 1.00E+006

      — SQL operation complete.
      >>
      >>prepare badQuery from select count from skewSubstr join otherTable
      +> on substr(b,1,9) = substr(d,1,9);

      — SQL command prepared.
      >>
      >>-- notice in the explain, that the descriptions on the esp_exchanges are
      >>-- (hash2), which indicates that skew-buster is NOT operative
      >>explain options 'f' badQuery;

      LC RC OP OPERATOR OPT DESCRIPTION CARD
      ---- ---- ---- -------------------- -------- -------------------- ---------

      8 . 9 root 1.00E+000
      7 . 8 sort_partial_aggr_ro 1.00E+000
      6 . 7 esp_exchange 1:4(hash2) 1.00E+000
      5 . 6 sort_partial_aggr_le 1.00E+000
      4 2 5 hybrid_hash_join 1.79E+006
      3 . 4 esp_exchange 4(hash2):4(hash2) 1.00E+006
      . . 3 trafodion_scan SKEWSUBSTR 1.00E+006
      1 . 2 esp_exchange 4(hash2):4(hash2) 1.00E+006
      . . 1 trafodion_scan OTHERTABLE 1.00E+006

      — SQL operation complete.

      Notice that when the join predicate is of the form SUBSTR(b,1,9) = SUBSTR(d,1,9), vanilla hash repartitioning is chosen even though the values of column b show extreme skew. The problem is that with the SUBSTRs present, we use default distribution properties for the join predicate instead of leveraging our knowledge about the skew in column B.

      Attachments

        Issue Links

          Activity

            People

              dbirdsall Dave Birdsall
              dbirdsall Dave Birdsall
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: