Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-987

Hash shuffle should be balanced according to intermediate volumes

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: Data Shuffle
    • Labels:
      None

      Description

      It is not hard to see skewed data set in practice. Currently, hash shuffled intermediate are performed by distributing partition keys without considering their partition volumes. As a result, with skewed intermediate data, a few of nodes are likely to take much longer time than most of all nodes. It can cause performance degradation. We need some solution to mitigate this problem.

      This patch assigns the intermediate data by balancing their volumes. The approach is a kind of greedy algorithm. In many cases, the shuffle num can be over tens of thousands. I also considered the computation complexity. Its complexity is O (n). It will show reasonable performance and balanced results.

        Activity

        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #322 (See https://builds.apache.org/job/Tajo-master-build/322/)
        TAJO-987: Hash shuffle should be balanced according to intermediate volumes. (hyunsik: rev eeaf379a48030dd819a6daf2040c779379543ac8)

        • tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java
        • CHANGES
        • tajo-core/src/test/java/org/apache/tajo/master/TestRepartitioner.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #322 (See https://builds.apache.org/job/Tajo-master-build/322/ ) TAJO-987 : Hash shuffle should be balanced according to intermediate volumes. (hyunsik: rev eeaf379a48030dd819a6daf2040c779379543ac8) tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java CHANGES tajo-core/src/test/java/org/apache/tajo/master/TestRepartitioner.java
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/101

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/101
        Hide
        hyunsik Hyunsik Choi added a comment -

        committed it to master branch.

        Show
        hyunsik Hyunsik Choi added a comment - committed it to master branch.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on the pull request:

        https://github.com/apache/tajo/pull/101#issuecomment-51012246

        +1
        this patch looks good to me.
        As I commented, please remove some unused imports before commit.
        Thanks!

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/101#issuecomment-51012246 +1 this patch looks good to me. As I commented, please remove some unused imports before commit. Thanks!
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/101#discussion_r15738646

        — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java —
        @@ -18,7 +18,9 @@

        package org.apache.tajo.master.querymaster;

        +import com.google.common.annotations.VisibleForTesting;
        import com.google.common.collect.Lists;
        +import com.google.common.collect.Sets;
        — End diff –

        This is an unused imports.
        Please remove it.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/101#discussion_r15738646 — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java — @@ -18,7 +18,9 @@ package org.apache.tajo.master.querymaster; +import com.google.common.annotations.VisibleForTesting; import com.google.common.collect.Lists; +import com.google.common.collect.Sets; — End diff – This is an unused imports. Please remove it.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/101#discussion_r15738637

        — Diff: tajo-core/src/test/java/org/apache/tajo/master/TestRepartitioner.java —
        @@ -18,15 +18,21 @@

        package org.apache.tajo.master;

        +import com.google.common.collect.Maps;
        import org.apache.tajo.ExecutionBlockId;
        import org.apache.tajo.QueryId;
        import org.apache.tajo.TestTajoIds;
        import org.apache.tajo.ipc.TajoWorkerProtocol;
        +import org.apache.tajo.master.querymaster.QueryMaster;
        import org.apache.tajo.master.querymaster.QueryUnit;
        +import org.apache.tajo.master.querymaster.Repartitioner;
        +import org.apache.tajo.master.querymaster.SubQuery;
        +import org.apache.tajo.util.Pair;
        import org.apache.tajo.util.TUtil;
        import org.apache.tajo.worker.FetchImpl;
        import org.jboss.netty.handler.codec.http.QueryStringDecoder;
        import org.junit.Test;
        +import org.mockito.Mockito;
        — End diff –

        There are some unused imports.
        Please remove them.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/101#discussion_r15738637 — Diff: tajo-core/src/test/java/org/apache/tajo/master/TestRepartitioner.java — @@ -18,15 +18,21 @@ package org.apache.tajo.master; +import com.google.common.collect.Maps; import org.apache.tajo.ExecutionBlockId; import org.apache.tajo.QueryId; import org.apache.tajo.TestTajoIds; import org.apache.tajo.ipc.TajoWorkerProtocol; +import org.apache.tajo.master.querymaster.QueryMaster; import org.apache.tajo.master.querymaster.QueryUnit; +import org.apache.tajo.master.querymaster.Repartitioner; +import org.apache.tajo.master.querymaster.SubQuery; +import org.apache.tajo.util.Pair; import org.apache.tajo.util.TUtil; import org.apache.tajo.worker.FetchImpl; import org.jboss.netty.handler.codec.http.QueryStringDecoder; import org.junit.Test; +import org.mockito.Mockito; — End diff – There are some unused imports. Please remove them.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on the pull request:

        https://github.com/apache/tajo/pull/101#issuecomment-51010170

        Sorry for late review.
        I'll review today.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/101#issuecomment-51010170 Sorry for late review. I'll review today.
        Hide
        hyunsik Hyunsik Choi added a comment -

        I've submitted the patch at github.
        https://github.com/apache/tajo/pull/101

        Show
        hyunsik Hyunsik Choi added a comment - I've submitted the patch at github. https://github.com/apache/tajo/pull/101

          People

          • Assignee:
            hyunsik Hyunsik Choi
            Reporter:
            hyunsik Hyunsik Choi
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development