Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1743

Improve calculation of intermediate table statistics

    Details

      Description

      Internal storage calculates the statistic(min, max) for intermediate data shuffling. but the statistics requires only to the range shuffle in the current implementation. We should remove unnecessary computing

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user jinossy opened a pull request:

          https://github.com/apache/tajo/pull/678

          TAJO-1743: Improve calculation of intermediate table statistics

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/jinossy/tajo TAJO-1743

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tajo/pull/678.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #678


          commit 0d1a62cb7800a82b8bfc72d72b6a9954e371acfa
          Author: Jinho Kim <jhkim@apache.org>
          Date: 2015-08-05T10:44:13Z

          TAJO-1743: Improve calculation of intermediate table statistics


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user jinossy opened a pull request: https://github.com/apache/tajo/pull/678 TAJO-1743 : Improve calculation of intermediate table statistics You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinossy/tajo TAJO-1743 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/678.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #678 commit 0d1a62cb7800a82b8bfc72d72b6a9954e371acfa Author: Jinho Kim <jhkim@apache.org> Date: 2015-08-05T10:44:13Z TAJO-1743 : Improve calculation of intermediate table statistics
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user blrunner commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/678#discussion_r36596307

          — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/RangeShuffleFileWriteExec.java —
          @@ -56,14 +57,19 @@
          private KeyProjector keyProjector;

          public RangeShuffleFileWriteExec(final TaskAttemptContext context,

          • final PhysicalExec child, final Schema inSchema, final Schema outSchema,
          • final SortSpec[] sortSpecs) throws IOException {
          • super(context, inSchema, outSchema, child);
            + final ShuffleFileWriteNode plan,
              • End diff –

          Could you explain why you remove output schema from parameters of the constructor?

          Show
          githubbot ASF GitHub Bot added a comment - Github user blrunner commented on a diff in the pull request: https://github.com/apache/tajo/pull/678#discussion_r36596307 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/RangeShuffleFileWriteExec.java — @@ -56,14 +57,19 @@ private KeyProjector keyProjector; public RangeShuffleFileWriteExec(final TaskAttemptContext context, final PhysicalExec child, final Schema inSchema, final Schema outSchema, final SortSpec[] sortSpecs) throws IOException { super(context, inSchema, outSchema, child); + final ShuffleFileWriteNode plan, End diff – Could you explain why you remove output schema from parameters of the constructor?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on the pull request:

          https://github.com/apache/tajo/pull/678#issuecomment-129268173

          You can find it in PhysicalPlannerImpl. in-out schema is same in RangeShuffleWriter

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/678#issuecomment-129268173 You can find it in PhysicalPlannerImpl. in-out schema is same in RangeShuffleWriter
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user blrunner commented on the pull request:

          https://github.com/apache/tajo/pull/678#issuecomment-129269536

          Thanks @jinossy

          I found following codes using your comments:
          ```
          return new RangeShuffleFileWriteExec(ctx, subOp, plan.getInSchema(), plan.getInSchema(), sortSpecs);
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/678#issuecomment-129269536 Thanks @jinossy I found following codes using your comments: ``` return new RangeShuffleFileWriteExec(ctx, subOp, plan.getInSchema(), plan.getInSchema(), sortSpecs); ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user blrunner commented on the pull request:

          https://github.com/apache/tajo/pull/678#issuecomment-129296719

          +1

          LGTM.

          Show
          githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/678#issuecomment-129296719 +1 LGTM.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tajo/pull/678

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/678
          Hide
          jhkim Jinho Kim added a comment -

          committed it
          Thank you for your review!

          Show
          jhkim Jinho Kim added a comment - committed it Thank you for your review!
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Tajo-master-build #794 (See https://builds.apache.org/job/Tajo-master-build/794/)
          TAJO-1743: Improve calculation of intermediate table statistics. (jhkim: rev f87f6672991cabfcb80332d0a0fb9869751ea665)

          • tajo-plan/src/main/java/org/apache/tajo/plan/util/PlannerUtil.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/RangeShuffleFileWriteExec.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/avro/AvroAppender.java
          • tajo-common/src/main/java/org/apache/tajo/util/StringUtils.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java
          • CHANGES
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/sequencefile/SequenceFileAppender.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RowFile.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rcfile/RCFile.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/parquet/ParquetAppender.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RawFile.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rawfile/DirectRawFileWriter.java
          • tajo-common/src/main/java/org/apache/tajo/storage/StorageConstants.java
          • tajo-core/src/test/java/org/apache/tajo/engine/query/TestCreateTable.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-build #794 (See https://builds.apache.org/job/Tajo-master-build/794/ ) TAJO-1743 : Improve calculation of intermediate table statistics. (jhkim: rev f87f6672991cabfcb80332d0a0fb9869751ea665) tajo-plan/src/main/java/org/apache/tajo/plan/util/PlannerUtil.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/RangeShuffleFileWriteExec.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/avro/AvroAppender.java tajo-common/src/main/java/org/apache/tajo/util/StringUtils.java tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java CHANGES tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/sequencefile/SequenceFileAppender.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RowFile.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rcfile/RCFile.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/parquet/ParquetAppender.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RawFile.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rawfile/DirectRawFileWriter.java tajo-common/src/main/java/org/apache/tajo/storage/StorageConstants.java tajo-core/src/test/java/org/apache/tajo/engine/query/TestCreateTable.java
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Tajo-master-CODEGEN-build #432 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/432/)
          TAJO-1743: Improve calculation of intermediate table statistics. (jhkim: rev f87f6672991cabfcb80332d0a0fb9869751ea665)

          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rawfile/DirectRawFileWriter.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/parquet/ParquetAppender.java
          • CHANGES
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RowFile.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java
          • tajo-common/src/main/java/org/apache/tajo/util/StringUtils.java
          • tajo-core/src/test/java/org/apache/tajo/engine/query/TestCreateTable.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RawFile.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/RangeShuffleFileWriteExec.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/avro/AvroAppender.java
          • tajo-plan/src/main/java/org/apache/tajo/plan/util/PlannerUtil.java
          • tajo-common/src/main/java/org/apache/tajo/storage/StorageConstants.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rcfile/RCFile.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/sequencefile/SequenceFileAppender.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #432 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/432/ ) TAJO-1743 : Improve calculation of intermediate table statistics. (jhkim: rev f87f6672991cabfcb80332d0a0fb9869751ea665) tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rawfile/DirectRawFileWriter.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/parquet/ParquetAppender.java CHANGES tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RowFile.java tajo-core/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java tajo-common/src/main/java/org/apache/tajo/util/StringUtils.java tajo-core/src/test/java/org/apache/tajo/engine/query/TestCreateTable.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/RawFile.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/RangeShuffleFileWriteExec.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/avro/AvroAppender.java tajo-plan/src/main/java/org/apache/tajo/plan/util/PlannerUtil.java tajo-common/src/main/java/org/apache/tajo/storage/StorageConstants.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/rcfile/RCFile.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/sequencefile/SequenceFileAppender.java

            People

            • Assignee:
              jhkim Jinho Kim
              Reporter:
              jhkim Jinho Kim
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development