Details

    • Type: Bug
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None

      Description

      ORDER BY with a null column miss some data. I found RowStoreEncoder has a bug dealing with NULL values.
      "continue" statement should be add.

            Column col;
            for (int i = 0; i < schema.size(); i++) {
              if (tuple.isNull(i)) {
                nullFlags.set(i);
                continue;
              }
              col = schema.getColumn(i);
      

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user babokim opened a pull request:

        https://github.com/apache/tajo/pull/70

        TAJO-904: ORDER BY with a null column miss some data.

        Currently Tajo ignores null value when calculates range partition range. I added a MaxValueNull flag variable in ColumnStats. If ColumnStats's max value is null, ColumnStats's max value is a non-null value and MaxValueNull flag is true. So Repartitioner calculates range cardinality with non-null value. Finally Repartitioner set null value to last range.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/babokim/tajo TAJO-904

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/70.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #70


        commit 3acb265add415a9f4722935f9a69d6cd97b6f987
        Author: 김형준 <babokim@babokim-mbp.server.gruter.com>
        Date: 2014-07-14T03:31:21Z

        TAJO-904: ORDER BY with a null column miss some data.

        commit 89a35fd24bc7a8f9ab3ec7db0edfa5dad8e328a6
        Author: 김형준 <babokim@babokim-mbp.server.gruter.com>
        Date: 2014-07-14T03:31:59Z

        Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user babokim opened a pull request: https://github.com/apache/tajo/pull/70 TAJO-904 : ORDER BY with a null column miss some data. Currently Tajo ignores null value when calculates range partition range. I added a MaxValueNull flag variable in ColumnStats. If ColumnStats's max value is null, ColumnStats's max value is a non-null value and MaxValueNull flag is true. So Repartitioner calculates range cardinality with non-null value. Finally Repartitioner set null value to last range. You can merge this pull request into a Git repository by running: $ git pull https://github.com/babokim/tajo TAJO-904 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/70.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #70 commit 3acb265add415a9f4722935f9a69d6cd97b6f987 Author: 김형준 <babokim@babokim-mbp.server.gruter.com> Date: 2014-07-14T03:31:21Z TAJO-904 : ORDER BY with a null column miss some data. commit 89a35fd24bc7a8f9ab3ec7db0edfa5dad8e328a6 Author: 김형준 <babokim@babokim-mbp.server.gruter.com> Date: 2014-07-14T03:31:59Z Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/70#discussion_r14870602

        — Diff: tajo-core/src/test/java/org/apache/tajo/engine/query/TestSortQuery.java —
        @@ -169,4 +177,46 @@ public final void testTopkWithJson() throws Exception

        { assertResultSet(res); cleanupQuery(res); }

        +
        + @Test
        + public final void testSortNullColumn() throws Exception {
        + try {
        + executeString("DROP TABLE table1 PURGE;").close();
        — End diff –

        It should be ```DROP TABLE IF EXISTS table 1 PURGE```. Otherwise, it will cause 'no such table error' when a developer executes this test separately.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/70#discussion_r14870602 — Diff: tajo-core/src/test/java/org/apache/tajo/engine/query/TestSortQuery.java — @@ -169,4 +177,46 @@ public final void testTopkWithJson() throws Exception { assertResultSet(res); cleanupQuery(res); } + + @Test + public final void testSortNullColumn() throws Exception { + try { + executeString("DROP TABLE table1 PURGE;").close(); — End diff – It should be ```DROP TABLE IF EXISTS table 1 PURGE```. Otherwise, it will cause 'no such table error' when a developer executes this test separately.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/70#issuecomment-48992032

        It works well for null first case. But, in null last case, the query results in empty rows.

        In my view, null last supports is not trivial and requires a bunch of codes. In contrast, the current patch looks reasonable for only null first.

        If there is no objection, I'll change the issue name to 'ORDER BY NULL FIRST support', and I'll create another issue for null last.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/70#issuecomment-48992032 It works well for null first case. But, in null last case, the query results in empty rows. In my view, null last supports is not trivial and requires a bunch of codes. In contrast, the current patch looks reasonable for only null first. If there is no objection, I'll change the issue name to 'ORDER BY NULL FIRST support', and I'll create another issue for null last.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user babokim commented on the pull request:

        https://github.com/apache/tajo/pull/70#issuecomment-48993442

        Hyunsik, I agree your opinion. Please go ahead

        Show
        githubbot ASF GitHub Bot added a comment - Github user babokim commented on the pull request: https://github.com/apache/tajo/pull/70#issuecomment-48993442 Hyunsik, I agree your opinion. Please go ahead
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/70#issuecomment-48993979

        +1

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/70#issuecomment-48993979 +1
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/70

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/70
        Hide
        hyunsik Hyunsik Choi added a comment -

        committed. Thank you for your contribution.

        Show
        hyunsik Hyunsik Choi added a comment - committed. Thank you for your contribution.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #298 (See https://builds.apache.org/job/Tajo-master-build/298/)
        TAJO-904: ORDER BY Null first support. (Hyoungjun Kim via hyunsik) (hyunsik: rev 47d4fe22b8e5594d82054fa5a7b5cdc23f578be6)

        • tajo-core/src/test/java/org/apache/tajo/engine/query/TestSortQuery.java
        • tajo-core/src/main/java/org/apache/tajo/worker/Task.java
        • tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java
        • tajo-core/src/main/java/org/apache/tajo/engine/utils/TupleUtil.java
        • tajo-storage/src/main/java/org/apache/tajo/storage/TableStatistics.java
        • tajo-core/src/main/java/org/apache/tajo/engine/planner/RangePartitionAlgorithm.java
        • tajo-storage/src/main/java/org/apache/tajo/storage/RowStoreUtil.java
        • tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java
        • tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/statistics/ColumnStats.java
        • CHANGES
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #298 (See https://builds.apache.org/job/Tajo-master-build/298/ ) TAJO-904 : ORDER BY Null first support. (Hyoungjun Kim via hyunsik) (hyunsik: rev 47d4fe22b8e5594d82054fa5a7b5cdc23f578be6) tajo-core/src/test/java/org/apache/tajo/engine/query/TestSortQuery.java tajo-core/src/main/java/org/apache/tajo/worker/Task.java tajo-core/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java tajo-core/src/main/java/org/apache/tajo/engine/utils/TupleUtil.java tajo-storage/src/main/java/org/apache/tajo/storage/TableStatistics.java tajo-core/src/main/java/org/apache/tajo/engine/planner/RangePartitionAlgorithm.java tajo-storage/src/main/java/org/apache/tajo/storage/RowStoreUtil.java tajo-yarn-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/statistics/ColumnStats.java CHANGES

          People

          • Assignee:
            hjkim Hyoungjun Kim
            Reporter:
            hjkim Hyoungjun Kim
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development