Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-947

ColPartitionStoreExec can cause URISyntaxException due to special characters

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: Physical Operator
    • Labels:
      None

      Description

      When partition keys includes some special characters which cannot be represented in URI, ColPartitionStoreExec::getData can cause URISyntaxException. Please see the following stack trace:

      Progress: 62%, response time: 106.344 sec
      Progress: 62%, response time: 107.345 sec
      Progress: 62%, response time: 108.347 sec
      ERROR: java.net.URISyntaxException: Relative path in absolute URI: l_returnflag=yly:%20furiously%20ev/part-02-000002-000
      java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: l_returnflag=yly:%20furiously%20ev/part-02-000002-000
      	at org.apache.hadoop.fs.Path.initialize(Path.java:206)
      	at org.apache.hadoop.fs.Path.<init>(Path.java:172)
      	at org.apache.hadoop.fs.Path.<init>(Path.java:94)
      	at org.apache.tajo.storage.StorageUtil.concatPath(StorageUtil.java:104)
      	at org.apache.tajo.engine.planner.physical.ColPartitionStoreExec.getDataFile(ColPartitionStoreExec.java:112)
      	at org.apache.tajo.engine.planner.physical.SortBasedColPartitionStoreExec.getAppenderForNewPartition(SortBasedColPartitionStoreExec.java:86)
      	at org.apache.tajo.engine.planner.physical.SortBasedColPartitionStoreExec.next(SortBasedColPartitionStoreExec.java:156)
      	at org.apache.tajo.worker.Task.run(Task.java:425)
      	at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:406)
      	at java.lang.Thread.run(Thread.java:744)
      Caused by: java.net.URISyntaxException: Relative path in absolute URI: l_returnflag=yly:%20furiously%20ev/part-02-000002-000
      	at java.net.URI.checkPath(URI.java:1804)
      	at java.net.URI.<init>(URI.java:752)
      	at org.apache.hadoop.fs.Path.initialize(Path.java:203)
      	... 9 more
      
      1. TAJO-947_2.patch
        12 kB
        Hyunsik Choi
      2. TAJO-947.Mai.140728.patch.txt
        5 kB
        Mai Hai Thanh
      3. TAJO-947.Mai.140730.patch.txt
        11 kB
        Mai Hai Thanh

        Activity

        Hide
        mhthanh Mai Hai Thanh added a comment -

        Hi Hyunsik Choi,

        I submitted a patch to solve this bug. As I understand, Tajo creates a folder for each partition and a part of the folder name contains the values of the partition keys. So, even though we got only URI syntax exception, invalid characters also result in invalid folder names. Hence, my proposed solution, as represented in the patch, is to remove all invalid characters from the values of partition key columns when these values are inserted into the partitioned table. The downside of this solution is that some user's data (i.e., those special characters in the key columns) is lost. However, I could not find any better solution.

        Show
        mhthanh Mai Hai Thanh added a comment - Hi Hyunsik Choi , I submitted a patch to solve this bug. As I understand, Tajo creates a folder for each partition and a part of the folder name contains the values of the partition keys. So, even though we got only URI syntax exception, invalid characters also result in invalid folder names. Hence, my proposed solution, as represented in the patch, is to remove all invalid characters from the values of partition key columns when these values are inserted into the partitioned table. The downside of this solution is that some user's data (i.e., those special characters in the key columns) is lost. However, I could not find any better solution.
        Hide
        hyunsik Hyunsik Choi added a comment -

        Hi Mai,

        Thank you for your contribution! Column partition is designed for a partition type compatible to Apache Hive partition. So, we should follow their rule. I've just investigated the solution. It would be useful for your work.

        They had the same issues to us, but it is too old one.
        https://issues.apache.org/jira/browse/HIVE-883

        Their current code is as follows:
        https://github.com/apache/hive/blob/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java#L301

        I hope that it would be helpful to your work.

        Show
        hyunsik Hyunsik Choi added a comment - Hi Mai, Thank you for your contribution! Column partition is designed for a partition type compatible to Apache Hive partition. So, we should follow their rule. I've just investigated the solution. It would be useful for your work. They had the same issues to us, but it is too old one. https://issues.apache.org/jira/browse/HIVE-883 Their current code is as follows: https://github.com/apache/hive/blob/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/Warehouse.java#L301 I hope that it would be helpful to your work.
        Hide
        mhthanh Mai Hai Thanh added a comment -

        Thank Hyunsik,

        I will read the solution of HIVE.

        Show
        mhthanh Mai Hai Thanh added a comment - Thank Hyunsik, I will read the solution of HIVE.
        Hide
        hyunsik Hyunsik Choi added a comment -

        You're welcome!

        Show
        hyunsik Hyunsik Choi added a comment - You're welcome!
        Hide
        mhthanh Mai Hai Thanh added a comment -

        Hi Hyunsik,

        I submitted a new patch that follows HIVE's rules. I think that the special characters can be handled well now.

        Show
        mhthanh Mai Hai Thanh added a comment - Hi Hyunsik, I submitted a new patch that follows HIVE's rules. I think that the special characters can be handled well now.
        Hide
        mhthanh Mai Hai Thanh added a comment -

        Hi Hyunsik Choi,

        Could you check the latest patch of this issue ?

        Show
        mhthanh Mai Hai Thanh added a comment - Hi Hyunsik Choi , Could you check the latest patch of this issue ?
        Hide
        hyunsik Hyunsik Choi added a comment -

        I missed your patch. I'm sorry for that. I'll review your patch by today.

        Thanks!

        Show
        hyunsik Hyunsik Choi added a comment - I missed your patch. I'm sorry for that. I'll review your patch by today. Thanks!
        Hide
        hyunsik Hyunsik Choi added a comment -

        In order to trigger the jenkins test, I've canceled the patch and submitted it again.

        Show
        hyunsik Hyunsik Choi added a comment - In order to trigger the jenkins test, I've canceled the patch and submitted it again.
        Hide
        hyunsik Hyunsik Choi added a comment -

        +1

        Your patch looks nice to me. I've changed some trivial things as follows:

        • ReIndented some methods of StringUtils.
        • Moved the unit tests in TestCaseByCases to TestTablePartitions.
        • Added some result close code in the unit tests.

        If you agree with my change, I'll commit it shortly.

        Thank you for your contribution!

        Show
        hyunsik Hyunsik Choi added a comment - +1 Your patch looks nice to me. I've changed some trivial things as follows: ReIndented some methods of StringUtils. Moved the unit tests in TestCaseByCases to TestTablePartitions. Added some result close code in the unit tests. If you agree with my change, I'll commit it shortly. Thank you for your contribution!
        Hide
        mhthanh Mai Hai Thanh added a comment -

        I agree, just go ahead!

        Show
        mhthanh Mai Hai Thanh added a comment - I agree, just go ahead!
        Hide
        tajoqa Tajo QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12660915/TAJO-947_2.patch
        against master revision ddfc3f3.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 5 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        -1 findbugs. The patch appears to introduce 215 new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in tajo-common tajo-core.

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/474//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/474//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/474//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-common.html
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/474//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660915/TAJO-947_2.patch against master revision ddfc3f3. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 215 new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-common tajo-core. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/474//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/474//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/474//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-common.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/474//console This message is automatically generated.
        Hide
        hyunsik Hyunsik Choi added a comment -

        I missed your name in CHANGES. I've changed that and I'll commit it shortly.

        Show
        hyunsik Hyunsik Choi added a comment - I missed your name in CHANGES. I've changed that and I'll commit it shortly.
        Hide
        hyunsik Hyunsik Choi added a comment -

        I've just committed your patch to master branch. Thank you for your contribution.

        Show
        hyunsik Hyunsik Choi added a comment - I've just committed your patch to master branch. Thank you for your contribution.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #331 (See https://builds.apache.org/job/Tajo-master-build/331/)
        TAJO-947: ColPartitionStoreExec can cause URISyntaxException due to special characters. (Mai Hai Thanh via hyunsik) (hyunsik: rev 87e7ba21491ac8a5a6a56357d7b4185c94f5dfd6)

        • CHANGES
        • tajo-core/src/test/resources/results/TestTablePartitions/TestSpecialCharPartitionKeys1.result
        • tajo-core/src/test/resources/queries/TestTablePartitions/lineitemspecial_ddl.sql
        • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashBasedColPartitionStoreExec.java
        • tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java
        • tajo-common/src/main/java/org/apache/tajo/util/StringUtils.java
        • tajo-core/src/test/resources/results/TestTablePartitions/TestSpecialCharPartitionKeys2.result
        • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SortBasedColPartitionStoreExec.java
        • tajo-core/src/test/resources/dataset/TestTablePartitions/lineitemspecial.tbl
        • tajo-core/src/main/java/org/apache/tajo/engine/utils/TupleUtil.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #331 (See https://builds.apache.org/job/Tajo-master-build/331/ ) TAJO-947 : ColPartitionStoreExec can cause URISyntaxException due to special characters. (Mai Hai Thanh via hyunsik) (hyunsik: rev 87e7ba21491ac8a5a6a56357d7b4185c94f5dfd6) CHANGES tajo-core/src/test/resources/results/TestTablePartitions/TestSpecialCharPartitionKeys1.result tajo-core/src/test/resources/queries/TestTablePartitions/lineitemspecial_ddl.sql tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashBasedColPartitionStoreExec.java tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java tajo-common/src/main/java/org/apache/tajo/util/StringUtils.java tajo-core/src/test/resources/results/TestTablePartitions/TestSpecialCharPartitionKeys2.result tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SortBasedColPartitionStoreExec.java tajo-core/src/test/resources/dataset/TestTablePartitions/lineitemspecial.tbl tajo-core/src/main/java/org/apache/tajo/engine/utils/TupleUtil.java

          People

          • Assignee:
            mhthanh Mai Hai Thanh
            Reporter:
            hyunsik Hyunsik Choi
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development