Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1975

Gathering fine-grained column statistics for range shuffle

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0, 0.11.1
    • Component/s: None
    • Labels:
      None

      Description

      One of the stages where statistics is very useful is the shuffle stage during query execution.Tajo also utilizes statistics for range shuffle.

      Currently, once gathering statistics is enabled, it is collected on every column of the input schema rather than the shuffle key columns. This may cause unnecessary overhead, so we need to collect statistics on only the shuffle keys.

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user jihoonson opened a pull request:

        https://github.com/apache/tajo/pull/859

        TAJO-1975: Gathering fine-grained column statistics for range shuffle

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/jihoonson/tajo-2 TAJO-1975

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/859.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #859


        commit 79ecf906488d6b216ab2f42ed387dde61135c22c
        Author: Jihoon Son <jihoonson@apache.org>
        Date: 2015-11-12T05:38:27Z

        • Add statistics collection flag for each column
        • Rename HashShuffleAppender to HashShuffleAppenderWrapper

        commit 809b95f5248aaa4ee9218acfb187cacf83243b1f
        Author: Jihoon Son <jihoonson@apache.org>
        Date: 2015-11-12T06:36:27Z

        TAJO-1975

        commit 225c6e29e06715403afc3e964ab8adb5be073139
        Author: Jihoon Son <jihoonson@apache.org>
        Date: 2015-11-12T06:58:17Z

        Fix test failure


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user jihoonson opened a pull request: https://github.com/apache/tajo/pull/859 TAJO-1975 : Gathering fine-grained column statistics for range shuffle You can merge this pull request into a Git repository by running: $ git pull https://github.com/jihoonson/tajo-2 TAJO-1975 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/859.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #859 commit 79ecf906488d6b216ab2f42ed387dde61135c22c Author: Jihoon Son <jihoonson@apache.org> Date: 2015-11-12T05:38:27Z Add statistics collection flag for each column Rename HashShuffleAppender to HashShuffleAppenderWrapper commit 809b95f5248aaa4ee9218acfb187cacf83243b1f Author: Jihoon Son <jihoonson@apache.org> Date: 2015-11-12T06:36:27Z TAJO-1975 commit 225c6e29e06715403afc3e964ab8adb5be073139 Author: Jihoon Son <jihoonson@apache.org> Date: 2015-11-12T06:58:17Z Fix test failure
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/859#discussion_r44746339

        — Diff: tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/AbstractHBaseAppender.java —
        @@ -53,7 +51,8 @@

        protected ColumnMapping columnMapping;
        protected TableStatistics stats;

        • protected boolean enabledStats;
          + protected boolean tableStatsEnabled;
          + protected BitSet columnStatsEnabled;
            • End diff –

        BitSet needs computation cost. How about change to the array?

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/859#discussion_r44746339 — Diff: tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/AbstractHBaseAppender.java — @@ -53,7 +51,8 @@ protected ColumnMapping columnMapping; protected TableStatistics stats; protected boolean enabledStats; + protected boolean tableStatsEnabled; + protected BitSet columnStatsEnabled; End diff – BitSet needs computation cost. How about change to the array?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/859#discussion_r44746653

        — Diff: tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/AbstractHBaseAppender.java —
        @@ -53,7 +51,8 @@

        protected ColumnMapping columnMapping;
        protected TableStatistics stats;

        • protected boolean enabledStats;
          + protected boolean tableStatsEnabled;
          + protected BitSet columnStatsEnabled;
            • End diff –

        Thanks for your comment. I've changed it.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/859#discussion_r44746653 — Diff: tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/AbstractHBaseAppender.java — @@ -53,7 +51,8 @@ protected ColumnMapping columnMapping; protected TableStatistics stats; protected boolean enabledStats; + protected boolean tableStatsEnabled; + protected BitSet columnStatsEnabled; End diff – Thanks for your comment. I've changed it.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on the pull request:

        https://github.com/apache/tajo/pull/859#issuecomment-156319375

        +1 Looks great to me

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/859#issuecomment-156319375 +1 Looks great to me
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/859

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/859
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-master-CODEGEN-build #591 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/591/)
        TAJO-1975: Gathering fine-grained column statistics for range shuffle. (jihoonson: rev 011fcd922d0a809e8d6d88de441594fd13f649a0)

        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/parquet/ParquetAppender.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/HashShuffleAppenderWrapper.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rcfile/RCFile.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RawFile.java
        • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashShuffleFileWriteExec.java
        • tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/TableStatistics.java
        • tajo-storage/tajo-storage-hdfs/src/test/java/org/apache/tajo/storage/TestCompressionStorages.java
        • tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/HBasePutAppender.java
        • CHANGES
        • tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/AbstractHBaseAppender.java
        • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/RangeShuffleFileWriteExec.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/FileAppender.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/HashShuffleAppenderManager.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/orc/ORCAppender.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/HashShuffleAppender.java
        • tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/HFileAppender.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rawfile/DirectRawFileWriter.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/sequencefile/SequenceFileAppender.java
        • tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/Appender.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/DelimitedTextFile.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RowFile.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/avro/AvroAppender.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #591 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/591/ ) TAJO-1975 : Gathering fine-grained column statistics for range shuffle. (jihoonson: rev 011fcd922d0a809e8d6d88de441594fd13f649a0) tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/parquet/ParquetAppender.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/HashShuffleAppenderWrapper.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rcfile/RCFile.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RawFile.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashShuffleFileWriteExec.java tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/TableStatistics.java tajo-storage/tajo-storage-hdfs/src/test/java/org/apache/tajo/storage/TestCompressionStorages.java tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/HBasePutAppender.java CHANGES tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/AbstractHBaseAppender.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/RangeShuffleFileWriteExec.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/FileAppender.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/HashShuffleAppenderManager.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/orc/ORCAppender.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/HashShuffleAppender.java tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/HFileAppender.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rawfile/DirectRawFileWriter.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/sequencefile/SequenceFileAppender.java tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/Appender.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/DelimitedTextFile.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RowFile.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/avro/AvroAppender.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-master-build #974 (See https://builds.apache.org/job/Tajo-master-build/974/)
        TAJO-1975: Gathering fine-grained column statistics for range shuffle. (jihoonson: rev 011fcd922d0a809e8d6d88de441594fd13f649a0)

        • tajo-storage/tajo-storage-hdfs/src/test/java/org/apache/tajo/storage/TestCompressionStorages.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rawfile/DirectRawFileWriter.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/HashShuffleAppender.java
        • CHANGES
        • tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/TableStatistics.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/HashShuffleAppenderWrapper.java
        • tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/AbstractHBaseAppender.java
        • tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/Appender.java
        • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashShuffleFileWriteExec.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/avro/AvroAppender.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/parquet/ParquetAppender.java
        • tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/HBasePutAppender.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RowFile.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/orc/ORCAppender.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RawFile.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/DelimitedTextFile.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rcfile/RCFile.java
        • tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/HFileAppender.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/FileAppender.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/HashShuffleAppenderManager.java
        • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/RangeShuffleFileWriteExec.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/sequencefile/SequenceFileAppender.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-build #974 (See https://builds.apache.org/job/Tajo-master-build/974/ ) TAJO-1975 : Gathering fine-grained column statistics for range shuffle. (jihoonson: rev 011fcd922d0a809e8d6d88de441594fd13f649a0) tajo-storage/tajo-storage-hdfs/src/test/java/org/apache/tajo/storage/TestCompressionStorages.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rawfile/DirectRawFileWriter.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/HashShuffleAppender.java CHANGES tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/TableStatistics.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/HashShuffleAppenderWrapper.java tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/AbstractHBaseAppender.java tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/Appender.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashShuffleFileWriteExec.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/avro/AvroAppender.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/parquet/ParquetAppender.java tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/HBasePutAppender.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RowFile.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/orc/ORCAppender.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RawFile.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/DelimitedTextFile.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rcfile/RCFile.java tajo-storage/tajo-storage-hbase/src/main/java/org/apache/tajo/storage/hbase/HFileAppender.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/FileAppender.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/HashShuffleAppenderManager.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/RangeShuffleFileWriteExec.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/sequencefile/SequenceFileAppender.java
        Hide
        jihoonson Jihoon Son added a comment -

        Committed to master and 0.11.1

        Show
        jihoonson Jihoon Son added a comment - Committed to master and 0.11.1

          People

          • Assignee:
            jihoonson Jihoon Son
            Reporter:
            jihoonson Jihoon Son
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development