Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1004

UniformRangePartition cannot deal with unicode ranges

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None

      Description

      UniformRangePartition plays a role to divide a given range into multiple sub ranges. It is used for range shuffle.

      When UniformRangePartition divides a range of text fields, it Internally transforms arrays of bytes into unsigned integer. But, a byte represent -127 to 127, and unicode is out of range in byte representation. So, the current approach can cause errors in this case.

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/116

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/116
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/116#issuecomment-52885509

        Thank you for the review. I'll commit it shortly.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/116#issuecomment-52885509 Thank you for the review. I'll commit it shortly.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on the pull request:

        https://github.com/apache/tajo/pull/116#issuecomment-52884744

        +1
        Looks great to me. Ship it!

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/116#issuecomment-52884744 +1 Looks great to me. Ship it!
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/116#issuecomment-52881787

        Fixed the problem, and I rebased it.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/116#issuecomment-52881787 Fixed the problem, and I rebased it.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on the pull request:

        https://github.com/apache/tajo/pull/116#issuecomment-52880696

        Hyunsik,
        Could you take a look at this ?
        Thanks

        Failed tests: testIncrementOfUnicode(org.apache.tajo.engine.planner.TestUniformRangePartition): prev=(0=>##), current=(0=>##)

        Tests in error:
        testPartitionForUnicodeDiffLenEndTextAsc(org.apache.tajo.engine.planner.TestUniformRangePartition): Computed ranges are not totally ordered. Previous key=[(0=>##?), (0=>##)), Current Key=[(0=>##), (0=>##??))

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/116#issuecomment-52880696 Hyunsik, Could you take a look at this ? Thanks Failed tests: testIncrementOfUnicode(org.apache.tajo.engine.planner.TestUniformRangePartition): prev=(0=>## ), current=(0=>## ) Tests in error: testPartitionForUnicodeDiffLenEndTextAsc(org.apache.tajo.engine.planner.TestUniformRangePartition): Computed ranges are not totally ordered. Previous key=[(0=>##?), (0=>## )), Current Key=[(0=>## ), (0=>##??))
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/116#issuecomment-52750296

        I've updated the patch. The updated patch includes some hidden bug fix and unit tests to verify the found bugs.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/116#issuecomment-52750296 I've updated the patch. The updated patch includes some hidden bug fix and unit tests to verify the found bugs.
        Hide
        hyunsik Hyunsik Choi added a comment -

        Hi Mai Hai Thanh,

        Thank you Mai for your review. I've updated the patch. The updated patch includes some hidden bug fix and unit tests to verify the found bugs.

        Thanks!

        Show
        hyunsik Hyunsik Choi added a comment - Hi Mai Hai Thanh , Thank you Mai for your review. I've updated the patch. The updated patch includes some hidden bug fix and unit tests to verify the found bugs. Thanks!
        Hide
        mhthanh Mai Hai Thanh added a comment -

        The latest update looks good. +1 from me.

        Show
        mhthanh Mai Hai Thanh added a comment - The latest update looks good. +1 from me.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/116#issuecomment-52168723

        I've fixed wrong existing unit tests, and fixed some potential problems. It is ready to be reviewed and committed. Please review this!

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/116#issuecomment-52168723 I've fixed wrong existing unit tests, and fixed some potential problems. It is ready to be reviewed and committed. Please review this!
        Hide
        hyunsik Hyunsik Choi added a comment -

        After I correct the if-condition statement, two test failures occur now. I'll submit new patch after I'll fix them.

        Show
        hyunsik Hyunsik Choi added a comment - After I correct the if-condition statement, two test failures occur now. I'll submit new patch after I'll fix them.
        Hide
        hyunsik Hyunsik Choi added a comment -

        Thank for your review. That would be very helpful for me. I'll fix it soon.

        Show
        hyunsik Hyunsik Choi added a comment - Thank for your review. That would be very helpful for me. I'll fix it soon.
        Hide
        mhthanh Mai Hai Thanh added a comment -

        Hi Hyunsik,
        Your patch looks nice.
        Besides, I think that the following lines of code has a bug:

        for (TupleRange r : ranges) {
          if (prev == null) {
          prev = r;
          } else {
          assertTrue(prev.compareTo(r) < 0);
          }
        }
        

        Because prev is assigned only once in the loop, if the partition number is greater than 2, many subsequent comparisons between prev and r are not what you expect.

        So, I'd suggest to rewrite those lines of code as follows:

        for (TupleRange r : ranges) {
          if (prev != null) {
            assertTrue(prev.compareTo(r) < 0);
          }
          prev = r;
        }
        

        This code appears about 6 or 7 times in TestUniformRangePartition.java.

        Show
        mhthanh Mai Hai Thanh added a comment - Hi Hyunsik, Your patch looks nice. Besides, I think that the following lines of code has a bug: for (TupleRange r : ranges) { if (prev == null ) { prev = r; } else { assertTrue(prev.compareTo(r) < 0); } } Because prev is assigned only once in the loop, if the partition number is greater than 2, many subsequent comparisons between prev and r are not what you expect. So, I'd suggest to rewrite those lines of code as follows: for (TupleRange r : ranges) { if (prev != null ) { assertTrue(prev.compareTo(r) < 0); } prev = r; } This code appears about 6 or 7 times in TestUniformRangePartition.java.
        Hide
        hyunsik Hyunsik Choi added a comment -

        submitted the patch to the github.

        Show
        hyunsik Hyunsik Choi added a comment - submitted the patch to the github.
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user hyunsik opened a pull request:

        https://github.com/apache/tajo/pull/116

        TAJO-1004: UniformRangePartition cannot deal with unicode ranges.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/hyunsik/tajo TAJO-1004

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/116.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #116


        commit 8efe5597a553eaab4f4ed8724e7fbae820adbb71
        Author: Hyunsik Choi <hyunsik@apache.org>
        Date: 2014-08-13T09:38:53Z

        initial work.

        commit 1a938220a3887ba80031a178d01bf47634cb0339
        Author: Hyunsik Choi <hyunsik@apache.org>
        Date: 2014-08-14T01:24:09Z

        Changing calculate partition num.

        commit 7b5a689036c37633e1a47e0ab3b4aa92305b0d99
        Author: Hyunsik Choi <hyunsik@apache.org>
        Date: 2014-08-14T02:33:12Z

        Improved unicode partition.

        commit 262c88cfe3edd22ef638f7f077f27110cd93d860
        Author: Hyunsik Choi <hyunsik@apache.org>
        Date: 2014-08-14T04:21:10Z

        Completed and add unit tests.

        commit 4239e165315dc375850bc0d445b904aea14f7664
        Author: Hyunsik Choi <hyunsik@apache.org>
        Date: 2014-08-14T04:52:52Z

        Refactored trivial things.


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user hyunsik opened a pull request: https://github.com/apache/tajo/pull/116 TAJO-1004 : UniformRangePartition cannot deal with unicode ranges. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hyunsik/tajo TAJO-1004 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/116.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #116 commit 8efe5597a553eaab4f4ed8724e7fbae820adbb71 Author: Hyunsik Choi <hyunsik@apache.org> Date: 2014-08-13T09:38:53Z initial work. commit 1a938220a3887ba80031a178d01bf47634cb0339 Author: Hyunsik Choi <hyunsik@apache.org> Date: 2014-08-14T01:24:09Z Changing calculate partition num. commit 7b5a689036c37633e1a47e0ab3b4aa92305b0d99 Author: Hyunsik Choi <hyunsik@apache.org> Date: 2014-08-14T02:33:12Z Improved unicode partition. commit 262c88cfe3edd22ef638f7f077f27110cd93d860 Author: Hyunsik Choi <hyunsik@apache.org> Date: 2014-08-14T04:21:10Z Completed and add unit tests. commit 4239e165315dc375850bc0d445b904aea14f7664 Author: Hyunsik Choi <hyunsik@apache.org> Date: 2014-08-14T04:52:52Z Refactored trivial things.
        Hide
        hyunsik Hyunsik Choi added a comment -

        Thank you for your understanding. I'll submit the patch soon.

        Show
        hyunsik Hyunsik Choi added a comment - Thank you for your understanding. I'll submit the patch soon.
        Hide
        mhthanh Mai Hai Thanh added a comment -

        Hi Hyunsik Choi,
        No problem

        Show
        mhthanh Mai Hai Thanh added a comment - Hi Hyunsik Choi , No problem
        Hide
        hyunsik Hyunsik Choi added a comment -

        If you are interest in this issue, you can review it well. After I submit the patch, I'd like to propose you to review the patch if you are Ok.

        Show
        hyunsik Hyunsik Choi added a comment - If you are interest in this issue, you can review it well. After I submit the patch, I'd like to propose you to review the patch if you are Ok.
        Hide
        hyunsik Hyunsik Choi added a comment -

        Hi Mai,

        I'm really sorry for confusing you. Actually, I've already done this issue. I missed to assign this issue to myself. I apologize it again.

        Best regards,
        Hyunsik

        Show
        hyunsik Hyunsik Choi added a comment - Hi Mai, I'm really sorry for confusing you. Actually, I've already done this issue. I missed to assign this issue to myself. I apologize it again. Best regards, Hyunsik

          People

          • Assignee:
            hyunsik Hyunsik Choi
            Reporter:
            hyunsik Hyunsik Choi
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development