Uploaded image for project: 'Tajo (Retired)'
  1. Tajo (Retired)
  2. TAJO-593

outer groupby and groupby in derived table causes only one shuffle output number

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.8.0
    • distributed query plan
    • None

    Description

      See the following query case:

      select count(*) from (select l_orderkey, l_partkey, count(*) from lineitem group by l_orderkey, l_partkey) t1;
      

      In this case, SubQuery::calculateShuffleOutputNum() are used two times for choosing the number of shuffle outputs. At that time, SubQuery::calculateShuffleOutputNum() method finds GroupByNode to know the number of grouping keys. Here is one bug. SubQuery::calculateShuffleOutputNum() always the topmost GroupByNode. In most cases, it work well. But, outer groupby and groupby in derived table can cause the problem. In this case, we must use the most bottom groupby node. Actually, it is always the correct way.

      This patch fixes SubQuery::calculateShuffleOutputNum() to use the most bottom groupby node.

      Attachments

        1. TAJO-593.patch
          2 kB
          Hyunsik Choi

        Activity

          People

            hyunsik Hyunsik Choi
            hyunsik Hyunsik Choi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: