Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1283

ORDER BY with the first descending order causes wrong results

    Details

      Description

      Each order key by can be specified with ascending or descending order.
      Recently, I found that ORDER BY with the first descending order key causes wrong result.

      If second key is a descending order, it works well. Other cases work correctly.

      select l_orderkey, l_partkey from lineitem order by l_orderkey, l_partkey desc;
      
      l_orderkey,  l_partkey
      -------------------------------
      1,  155190
      1,  67310
      1,  63700
      1,  24027
      1,  15635
      1,  2132
      2,  106170
      3,  183095
      3,  128449
      3,  62143
      3,  29380
      3,  19036
      3,  4297
      ...
      

      But, if the first sort key is a descending order, it causes wrong row number and shows wrong range part as follows:

      default> select l_orderkey, l_partkey from lineitem order by l_orderkey desc, l_partkey;
      l_orderkey,  l_partkey
      -------------------------------
      3000000,  61045
      3000000,  159113
      3000000,  167695
      3000000,  167904
      3000000,  196339
      ...
      

      According to my investigation, it seems to be related to offset problem of RowFile or index problem. The final result includes duplicated rows and the final row was wrong as follows:

      part-02-000000-000
      3000000|61045
      3000000|159113
      3000000|167695
      3000000|167904
      3000000|196339
      2999975|28334
      2999975|194023
      2999974|8020
      2999974|124152
      2999974|129921
      2999974|139248
      2999974|168914
      2999974|187923
      2999973|30533
      2999973|36196
      ...
      2919713|133486
      2919713|195963
      2919712|86257
      2919712|94542
      2919712|107370
      2919712|166342 <- duplicated rows
      2919712|178277
      ....
      1|63700
      1|67310
      1|155190
      [EOF]
      
      part-02-000001-000
      |96127                     <- looks wrong
      6000000|32255
      6000000|96127
      5999975|6452
      5999975|7272
      5999975|37131
      ....
      ....
      2919713|133486
      2919713|195963
      2919712|94542
      2919712|107370
      2919712|166342    <- duplicated rows
      [EOF]
      

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user sirpkt opened a pull request:

        https://github.com/apache/tajo/pull/364

        TAJO-1283: ORDER BY with the first descending order causes wrong results

        As BaseTupleComparator already handles both ascending and descending sort keys, remove unnecessary DescendingTupleRangeComparator class.

        • We do not need to consider whether the first sort key is ascending or descending in scheduleRangeShuffledFetches() so related codes are removed.
        • Add TEST_MIN_TASK_NUM support in getNonLeafTaskNum() of Stage.java for Test purpose.

        'mvn clean install' passed.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/sirpkt/tajo TAJO-1283

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/364.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #364


        commit 30a16e3f166ca34c9391a4dbad64bd2a92b58737
        Author: Keuntae Park <sirpkt@apache.org>
        Date: 2015-01-28T06:37:07Z

        As BaseTupleComparator already handles both ascending and descending sort keys, remove unnecessary DescendingTupleRangeComparator class.

        • We do not need to consider whether the first sort key is ascending or descending in scheduleRangeShuffledFetches().
        • Add TEST_MIN_TASK_NUM support in getNonLeafTaskNum() of Stage.java for Test purpose.

        commit 61619daa98b3d7e89f902e6f587047af54a0a2de
        Author: Keuntae Park <sirpkt@apache.org>
        Date: 2015-01-28T06:41:39Z

        Merge branch 'master' into TAJO-1283


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user sirpkt opened a pull request: https://github.com/apache/tajo/pull/364 TAJO-1283 : ORDER BY with the first descending order causes wrong results As BaseTupleComparator already handles both ascending and descending sort keys, remove unnecessary DescendingTupleRangeComparator class. We do not need to consider whether the first sort key is ascending or descending in scheduleRangeShuffledFetches() so related codes are removed. Add TEST_MIN_TASK_NUM support in getNonLeafTaskNum() of Stage.java for Test purpose. 'mvn clean install' passed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sirpkt/tajo TAJO-1283 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/364.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #364 commit 30a16e3f166ca34c9391a4dbad64bd2a92b58737 Author: Keuntae Park <sirpkt@apache.org> Date: 2015-01-28T06:37:07Z As BaseTupleComparator already handles both ascending and descending sort keys, remove unnecessary DescendingTupleRangeComparator class. We do not need to consider whether the first sort key is ascending or descending in scheduleRangeShuffledFetches(). Add TEST_MIN_TASK_NUM support in getNonLeafTaskNum() of Stage.java for Test purpose. commit 61619daa98b3d7e89f902e6f587047af54a0a2de Author: Keuntae Park <sirpkt@apache.org> Date: 2015-01-28T06:41:39Z Merge branch 'master' into TAJO-1283
        Hide
        sirpkt Keuntae Park added a comment -

        Sorry, Jinho Kim. I found that this issue is assigned to you just after I uploaded the patch for the issue.
        If you also have the patch and has plan to upload the patch, I can drop my patch.

        Show
        sirpkt Keuntae Park added a comment - Sorry, Jinho Kim . I found that this issue is assigned to you just after I uploaded the patch for the issue. If you also have the patch and has plan to upload the patch, I can drop my patch.
        Hide
        jhkim Jinho Kim added a comment -

        Keuntae Park
        No problem, Could you explain the bug?

        Show
        jhkim Jinho Kim added a comment - Keuntae Park No problem, Could you explain the bug?
        Hide
        sirpkt Keuntae Park added a comment -

        Sure, Jinho Kim.
        Even though BaseTupleComparator can handle all the order by case properly including the first descending order case,
        scheduleRangeShuffledFetches() additionally reverses the comparison result through DescendingTupleRangeComparator in the first descending order case, and I think it is unnecessary operation.

        Show
        sirpkt Keuntae Park added a comment - Sure, Jinho Kim . Even though BaseTupleComparator can handle all the order by case properly including the first descending order case, scheduleRangeShuffledFetches() additionally reverses the comparison result through DescendingTupleRangeComparator in the first descending order case, and I think it is unnecessary operation.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/364#discussion_r23673538

        — Diff: tajo-core/src/main/java/org/apache/tajo/querymaster/Repartitioner.java —
        @@ -725,13 +725,8 @@ public static void scheduleRangeShuffledFetches(TaskSchedulerContext schedulerCo
        }
        }

        • boolean ascendingFirstKey = sortSpecs[0].isAscending();
          SortedMap<TupleRange, Collection<FetchImpl>> map;
        • if (ascendingFirstKey) { - map = new TreeMap<TupleRange, Collection<FetchImpl>>(); - }

          else {

        • map = new TreeMap<TupleRange, Collection<FetchImpl>>(new TupleRange.DescendingTupleRangeComparator());
            • End diff –

        This patch looks good to me.
        @hyunsik and @jihoonson Do we have any reason to keep the descending ranges ?

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/364#discussion_r23673538 — Diff: tajo-core/src/main/java/org/apache/tajo/querymaster/Repartitioner.java — @@ -725,13 +725,8 @@ public static void scheduleRangeShuffledFetches(TaskSchedulerContext schedulerCo } } boolean ascendingFirstKey = sortSpecs [0] .isAscending(); SortedMap<TupleRange, Collection<FetchImpl>> map; if (ascendingFirstKey) { - map = new TreeMap<TupleRange, Collection<FetchImpl>>(); - } else { map = new TreeMap<TupleRange, Collection<FetchImpl>>(new TupleRange.DescendingTupleRangeComparator()); End diff – This patch looks good to me. @hyunsik and @jihoonson Do we have any reason to keep the descending ranges ?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/364#discussion_r23840362

        — Diff: tajo-core/src/main/java/org/apache/tajo/querymaster/Repartitioner.java —
        @@ -740,7 +735,7 @@ public static void scheduleRangeShuffledFetches(TaskSchedulerContext schedulerCo
        fetchSet = new HashSet<FetchImpl>();
        for (FetchImpl fetch: fetches) {
        String rangeParam =

        • TupleUtil.rangeToQuery(ranges[i], ascendingFirstKey ? i == (ranges.length - 1) : i == 0, encoder);
            • End diff –

        @sirpkt
        I have heard from hyunsik that descending order consider the sampling and limit clause.
        In my opinion, main bug fixing is good.
        The range array is ascending. so ascendingFirstKey checking do not need
        ```
        TupleUtil.rangeToQuery(ranges[i], i == (ranges.length - 1), encoder);
        ```

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/364#discussion_r23840362 — Diff: tajo-core/src/main/java/org/apache/tajo/querymaster/Repartitioner.java — @@ -740,7 +735,7 @@ public static void scheduleRangeShuffledFetches(TaskSchedulerContext schedulerCo fetchSet = new HashSet<FetchImpl>(); for (FetchImpl fetch: fetches) { String rangeParam = TupleUtil.rangeToQuery(ranges [i] , ascendingFirstKey ? i == (ranges.length - 1) : i == 0, encoder); End diff – @sirpkt I have heard from hyunsik that descending order consider the sampling and limit clause. In my opinion, main bug fixing is good. The range array is ascending. so ascendingFirstKey checking do not need ``` TupleUtil.rangeToQuery(ranges [i] , i == (ranges.length - 1), encoder); ```
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/364#discussion_r23985334

        — Diff: tajo-core/src/main/java/org/apache/tajo/querymaster/Repartitioner.java —
        @@ -725,13 +725,8 @@ public static void scheduleRangeShuffledFetches(TaskSchedulerContext schedulerCo
        }
        }

        • boolean ascendingFirstKey = sortSpecs[0].isAscending();
          SortedMap<TupleRange, Collection<FetchImpl>> map;
        • if (ascendingFirstKey) { - map = new TreeMap<TupleRange, Collection<FetchImpl>>(); - }

          else {

        • map = new TreeMap<TupleRange, Collection<FetchImpl>>(new TupleRange.DescendingTupleRangeComparator());
            • End diff –

        I'm sorry for late response. Descending order is for the future feature. So, it is not necessary right now. Later, it can be added again when necessary.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/364#discussion_r23985334 — Diff: tajo-core/src/main/java/org/apache/tajo/querymaster/Repartitioner.java — @@ -725,13 +725,8 @@ public static void scheduleRangeShuffledFetches(TaskSchedulerContext schedulerCo } } boolean ascendingFirstKey = sortSpecs [0] .isAscending(); SortedMap<TupleRange, Collection<FetchImpl>> map; if (ascendingFirstKey) { - map = new TreeMap<TupleRange, Collection<FetchImpl>>(); - } else { map = new TreeMap<TupleRange, Collection<FetchImpl>>(new TupleRange.DescendingTupleRangeComparator()); End diff – I'm sorry for late response. Descending order is for the future feature. So, it is not necessary right now. Later, it can be added again when necessary.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/364#discussion_r23985350

        — Diff: tajo-core/src/main/java/org/apache/tajo/querymaster/Repartitioner.java —
        @@ -740,7 +735,7 @@ public static void scheduleRangeShuffledFetches(TaskSchedulerContext schedulerCo
        fetchSet = new HashSet<FetchImpl>();
        for (FetchImpl fetch: fetches) {
        String rangeParam =

        • TupleUtil.rangeToQuery(ranges[i], ascendingFirstKey ? i == (ranges.length - 1) : i == 0, encoder);
            • End diff –

        Please refer to the above comment I leaved.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/364#discussion_r23985350 — Diff: tajo-core/src/main/java/org/apache/tajo/querymaster/Repartitioner.java — @@ -740,7 +735,7 @@ public static void scheduleRangeShuffledFetches(TaskSchedulerContext schedulerCo fetchSet = new HashSet<FetchImpl>(); for (FetchImpl fetch: fetches) { String rangeParam = TupleUtil.rangeToQuery(ranges [i] , ascendingFirstKey ? i == (ranges.length - 1) : i == 0, encoder); End diff – Please refer to the above comment I leaved.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/364#issuecomment-72601391

        Hi @sirpkt,
        Could you trigger this unit test?

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/364#issuecomment-72601391 Hi @sirpkt, Could you trigger this unit test?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/364#issuecomment-72607962

        +1 ship it!

        The patch includes the correct bug fix and enough unit tests.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/364#issuecomment-72607962 +1 ship it! The patch includes the correct bug fix and enough unit tests.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/364#issuecomment-72608008

        @sirpkt Could you please commit this patch to both branch-0.10.0 and master branch?

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/364#issuecomment-72608008 @sirpkt Could you please commit this patch to both branch-0.10.0 and master branch?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/364

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/364
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user sirpkt commented on the pull request:

        https://github.com/apache/tajo/pull/364#issuecomment-72609394

        Thank you for the comments, @hyunsik, @jinossy
        I just committed to master and branch-0.10.0.

        Show
        githubbot ASF GitHub Bot added a comment - Github user sirpkt commented on the pull request: https://github.com/apache/tajo/pull/364#issuecomment-72609394 Thank you for the comments, @hyunsik, @jinossy I just committed to master and branch-0.10.0.
        Hide
        sirpkt Keuntae Park added a comment -

        It just committed to the master.
        Thank you for the kind reviews, Jinho Kim and Hyunsik Choi.

        Show
        sirpkt Keuntae Park added a comment - It just committed to the master. Thank you for the kind reviews, Jinho Kim and Hyunsik Choi .
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #576 (See https://builds.apache.org/job/Tajo-master-build/576/)
        TAJO-1283: ORDER BY with the first descending order causes wrong results. (Keuntae Park) (sirpkt: rev 02c6c266c013d8174a287bc57a6d4131da51ba96)

        • tajo-core/src/main/java/org/apache/tajo/querymaster/Repartitioner.java
        • CHANGES
        • tajo-core/src/test/resources/queries/TestSortQuery/testSortFirstDesc.sql
        • tajo-core/src/test/resources/results/TestSortQuery/testSortFirstDesc.result
        • tajo-core/src/test/java/org/apache/tajo/engine/query/TestSortQuery.java
        • tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java
        • tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/TupleRange.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #576 (See https://builds.apache.org/job/Tajo-master-build/576/ ) TAJO-1283 : ORDER BY with the first descending order causes wrong results. (Keuntae Park) (sirpkt: rev 02c6c266c013d8174a287bc57a6d4131da51ba96) tajo-core/src/main/java/org/apache/tajo/querymaster/Repartitioner.java CHANGES tajo-core/src/test/resources/queries/TestSortQuery/testSortFirstDesc.sql tajo-core/src/test/resources/results/TestSortQuery/testSortFirstDesc.result tajo-core/src/test/java/org/apache/tajo/engine/query/TestSortQuery.java tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/TupleRange.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-master-CODEGEN-build #215 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/215/)
        TAJO-1283: ORDER BY with the first descending order causes wrong results. (Keuntae Park) (sirpkt: rev 02c6c266c013d8174a287bc57a6d4131da51ba96)

        • tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/TupleRange.java
        • tajo-core/src/test/resources/queries/TestSortQuery/testSortFirstDesc.sql
        • tajo-core/src/test/java/org/apache/tajo/engine/query/TestSortQuery.java
        • CHANGES
        • tajo-core/src/main/java/org/apache/tajo/querymaster/Repartitioner.java
        • tajo-core/src/test/resources/results/TestSortQuery/testSortFirstDesc.result
        • tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #215 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/215/ ) TAJO-1283 : ORDER BY with the first descending order causes wrong results. (Keuntae Park) (sirpkt: rev 02c6c266c013d8174a287bc57a6d4131da51ba96) tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/TupleRange.java tajo-core/src/test/resources/queries/TestSortQuery/testSortFirstDesc.sql tajo-core/src/test/java/org/apache/tajo/engine/query/TestSortQuery.java CHANGES tajo-core/src/main/java/org/apache/tajo/querymaster/Repartitioner.java tajo-core/src/test/resources/results/TestSortQuery/testSortFirstDesc.result tajo-core/src/main/java/org/apache/tajo/querymaster/Stage.java

          People

          • Assignee:
            sirpkt Keuntae Park
            Reporter:
            hyunsik Hyunsik Choi
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development