Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1659

Simplify scan iteration in SeqScan

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Component/s: Physical Operator
    • Labels:
      None

      Description

      SeqScanExec internally uses if-condition branch to perform filter iteration or full-scan iteration on a table. It is possible to cause branch misprediction. Also, this approach makes code messy. This patch introduce an scan iterator to simplify the if-condition branch.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user hyunsik opened a pull request:

          https://github.com/apache/tajo/pull/612

          TAJO-1659: Simplify scan iteration in SeqScan.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/hyunsik/tajo TAJO-1659

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tajo/pull/612.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #612


          commit 90e20d617a9c6af176035f6039d6bbdd8e4f3331
          Author: Hyunsik Choi <hyunsik@apache.org>
          Date: 2015-06-24T11:12:37Z

          TAJO-1659: Simplify scan iteration in SeqScan.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user hyunsik opened a pull request: https://github.com/apache/tajo/pull/612 TAJO-1659 : Simplify scan iteration in SeqScan. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hyunsik/tajo TAJO-1659 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/612.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #612 commit 90e20d617a9c6af176035f6039d6bbdd8e4f3331 Author: Hyunsik Choi <hyunsik@apache.org> Date: 2015-06-24T11:12:37Z TAJO-1659 : Simplify scan iteration in SeqScan.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/612#discussion_r33147147

          — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java —
          @@ -228,26 +237,15 @@ public Tuple next() throws IOException

          { return null; }
          • Tuple tuple;
          • Tuple outTuple = new VTuple(outColumnNum);
            -
          • if (!plan.hasQual()) {
          • if ((tuple = scanner.next()) != null) { - projector.eval(tuple, outTuple); - outTuple.setOffset(tuple.getOffset()); - return outTuple; - }

            else

            { - return null; - }
          • } else {
          • while ((tuple = scanner.next()) != null) {
          • if (qual.eval(tuple).isTrue()) { - projector.eval(tuple, outTuple); - return outTuple; - }
          • }
          • return null;
            + while(scanIt.hasNext()) {
            + Tuple outTuple = new VTuple(outColumnNum);
              • End diff –

          outTuple don't have to be created for every input row.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/612#discussion_r33147147 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java — @@ -228,26 +237,15 @@ public Tuple next() throws IOException { return null; } Tuple tuple; Tuple outTuple = new VTuple(outColumnNum); - if (!plan.hasQual()) { if ((tuple = scanner.next()) != null) { - projector.eval(tuple, outTuple); - outTuple.setOffset(tuple.getOffset()); - return outTuple; - } else { - return null; - } } else { while ((tuple = scanner.next()) != null) { if (qual.eval(tuple).isTrue()) { - projector.eval(tuple, outTuple); - return outTuple; - } } return null; + while(scanIt.hasNext()) { + Tuple outTuple = new VTuple(outColumnNum); End diff – outTuple don't have to be created for every input row.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/612#discussion_r33147221

          — Diff: tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/HashShuffleAppenderManager.java —
          @@ -30,9 +30,11 @@
          import org.apache.tajo.catalog.TableMeta;
          import org.apache.tajo.conf.TajoConf;
          import org.apache.tajo.conf.TajoConf.ConfVars;
          +import org.apache.tajo.util.Pair;
          — End diff –

          Please remove unused imports.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/612#discussion_r33147221 — Diff: tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/HashShuffleAppenderManager.java — @@ -30,9 +30,11 @@ import org.apache.tajo.catalog.TableMeta; import org.apache.tajo.conf.TajoConf; import org.apache.tajo.conf.TajoConf.ConfVars; +import org.apache.tajo.util.Pair; — End diff – Please remove unused imports.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/612#issuecomment-114869942

          +1. I left trivial comments. Please consider them before commit.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/612#issuecomment-114869942 +1. I left trivial comments. Please consider them before commit.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/612#discussion_r33211777

          — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java —
          @@ -228,26 +237,15 @@ public Tuple next() throws IOException

          { return null; }
          • Tuple tuple;
          • Tuple outTuple = new VTuple(outColumnNum);
            -
          • if (!plan.hasQual()) {
          • if ((tuple = scanner.next()) != null) { - projector.eval(tuple, outTuple); - outTuple.setOffset(tuple.getOffset()); - return outTuple; - }

            else

            { - return null; - }
          • } else {
          • while ((tuple = scanner.next()) != null) {
          • if (qual.eval(tuple).isTrue()) { - projector.eval(tuple, outTuple); - return outTuple; - }
          • }
          • return null;
            + while(scanIt.hasNext()) {
            + Tuple outTuple = new VTuple(outColumnNum);
              • End diff –

          It is actually for every row. While condition is just for check if there is next row, and every while loop will be just returned for each tuple.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/612#discussion_r33211777 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java — @@ -228,26 +237,15 @@ public Tuple next() throws IOException { return null; } Tuple tuple; Tuple outTuple = new VTuple(outColumnNum); - if (!plan.hasQual()) { if ((tuple = scanner.next()) != null) { - projector.eval(tuple, outTuple); - outTuple.setOffset(tuple.getOffset()); - return outTuple; - } else { - return null; - } } else { while ((tuple = scanner.next()) != null) { if (qual.eval(tuple).isTrue()) { - projector.eval(tuple, outTuple); - return outTuple; - } } return null; + while(scanIt.hasNext()) { + Tuple outTuple = new VTuple(outColumnNum); End diff – It is actually for every row. While condition is just for check if there is next row, and every while loop will be just returned for each tuple.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on the pull request:

          https://github.com/apache/tajo/pull/612#issuecomment-115051227

          After merging another patch, it is causing unit test failures. I'll check them and then I'll commit.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/612#issuecomment-115051227 After merging another patch, it is causing unit test failures. I'll check them and then I'll commit.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik closed the pull request at:

          https://github.com/apache/tajo/pull/612

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik closed the pull request at: https://github.com/apache/tajo/pull/612
          Hide
          hyunsik Hyunsik Choi added a comment -

          committed. Thank you Jihoon for the quick review.

          Show
          hyunsik Hyunsik Choi added a comment - committed. Thank you Jihoon for the quick review.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Tajo-master-build #740 (See https://builds.apache.org/job/Tajo-master-build/740/)
          TAJO-1659: Simplify scan iteration in SeqScan. (hyunsik: rev 2ec307d621828cc18627ef7bd98e9c3c6774c5af)

          • CHANGES
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ScanIterator.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/FilterScanIterator.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/FullScanIterator.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #740 (See https://builds.apache.org/job/Tajo-master-build/740/ ) TAJO-1659 : Simplify scan iteration in SeqScan. (hyunsik: rev 2ec307d621828cc18627ef7bd98e9c3c6774c5af) CHANGES tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ScanIterator.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/FilterScanIterator.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/FullScanIterator.java
          Hide
          hudson Hudson added a comment -

          ABORTED: Integrated in Tajo-master-CODEGEN-build #380 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/380/)
          TAJO-1659: Simplify scan iteration in SeqScan. (hyunsik: rev 2ec307d621828cc18627ef7bd98e9c3c6774c5af)

          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ScanIterator.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/FullScanIterator.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java
          • CHANGES
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/FilterScanIterator.java
          Show
          hudson Hudson added a comment - ABORTED: Integrated in Tajo-master-CODEGEN-build #380 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/380/ ) TAJO-1659 : Simplify scan iteration in SeqScan. (hyunsik: rev 2ec307d621828cc18627ef7bd98e9c3c6774c5af) tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ScanIterator.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/FullScanIterator.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java CHANGES tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/FilterScanIterator.java

            People

            • Assignee:
              hyunsik Hyunsik Choi
              Reporter:
              hyunsik Hyunsik Choi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development