Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1381

Support multi-bytes delimiter for Text file

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0, 0.10.1
    • Component/s: Storage
    • Labels:
      None

      Description

      Supports multi-character / non-ascii delimiter for Text file.
      The CSVFile will be deprecated. we should implement it in DelimitedTextFile

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on the pull request:

          https://github.com/apache/tajo/pull/410#issuecomment-78203962

          @navis awesome!!

          Could you fix findbug warnings in your patch? and please upload the patch in jira
          "Found reliance on default encoding: String.getBytes()"

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/410#issuecomment-78203962 @navis awesome!! Could you fix findbug warnings in your patch? and please upload the patch in jira "Found reliance on default encoding: String.getBytes()"
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jinossy commented on the pull request:

          https://github.com/apache/tajo/pull/410#issuecomment-82188186

          @navis
          Could you rebase it ?

          Show
          githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/410#issuecomment-82188186 @navis Could you rebase it ?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user navis commented on the pull request:

          https://github.com/apache/tajo/pull/410#issuecomment-82624184

          Rebased onto master

          Show
          githubbot ASF GitHub Bot added a comment - Github user navis commented on the pull request: https://github.com/apache/tajo/pull/410#issuecomment-82624184 Rebased onto master
          Hide
          jhkim Jinho Kim added a comment -

          +1
          Navis Thank you for your contribution.
          I've fix Internationalization warning of findbugs.

          Show
          jhkim Jinho Kim added a comment - +1 Navis Thank you for your contribution. I've fix Internationalization warning of findbugs.
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12705250/TAJO-1381.patch
          against master revision release-0.9.0-rc0-205-g286b956.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

          +1 checkstyle. The patch generated 0 code style errors.

          -1 findbugs. The patch appears to introduce 4 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in tajo-core tajo-storage/tajo-storage-hdfs.

          Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/617//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/617//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/617//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-storage-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/617//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705250/TAJO-1381.patch against master revision release-0.9.0-rc0-205-g286b956. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-core tajo-storage/tajo-storage-hdfs. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/617//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/617//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/617//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-storage-hdfs.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/617//console This message is automatically generated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tajo/pull/410

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/410
          Hide
          jhkim Jinho Kim added a comment -

          Committed it

          Show
          jhkim Jinho Kim added a comment - Committed it
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Tajo-master-build #619 (See https://builds.apache.org/job/Tajo-master-build/619/)
          TAJO-1381: Support multi-bytes delimiter for Text file (jhkim: rev 82d44af32246c63a32c049292f0a229f16e85768)

          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/TextLineDeserializer.java
          • tajo-core/src/test/resources/results/TestSelectQuery/testMultiBytesDelimiter4.result
          • tajo-core/src/test/resources/queries/TestSelectQuery/multibytes_delimiter_table3_ddl.sql
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/FieldSplitProcessor.java
          • tajo-core/src/test/java/org/apache/tajo/engine/query/TestSelectQuery.java
          • tajo-core/src/test/resources/queries/TestSelectQuery/testMultiBytesDelimiter3.sql
          • tajo-storage/tajo-storage-hdfs/src/test/java/org/apache/tajo/storage/TestSplitProcessor.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/TextLineSerDe.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/CSVLineSerializer.java
          • tajo-core/src/test/resources/queries/TestSelectQuery/multibytes_delimiter_table4_ddl.sql
          • tajo-core/src/test/resources/results/TestSelectQuery/testMultiBytesDelimiter3.result
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/CSVLineSerDe.java
          • CHANGES
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/MultiBytesFieldSplitProcessor.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/CSVFile.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/CSVLineDeserializer.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/DelimitedTextFile.java
          • tajo-core/src/test/resources/queries/TestSelectQuery/testMultiBytesDelimiter4.sql
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #619 (See https://builds.apache.org/job/Tajo-master-build/619/ ) TAJO-1381 : Support multi-bytes delimiter for Text file (jhkim: rev 82d44af32246c63a32c049292f0a229f16e85768) tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/TextLineDeserializer.java tajo-core/src/test/resources/results/TestSelectQuery/testMultiBytesDelimiter4.result tajo-core/src/test/resources/queries/TestSelectQuery/multibytes_delimiter_table3_ddl.sql tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/FieldSplitProcessor.java tajo-core/src/test/java/org/apache/tajo/engine/query/TestSelectQuery.java tajo-core/src/test/resources/queries/TestSelectQuery/testMultiBytesDelimiter3.sql tajo-storage/tajo-storage-hdfs/src/test/java/org/apache/tajo/storage/TestSplitProcessor.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/TextLineSerDe.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/CSVLineSerializer.java tajo-core/src/test/resources/queries/TestSelectQuery/multibytes_delimiter_table4_ddl.sql tajo-core/src/test/resources/results/TestSelectQuery/testMultiBytesDelimiter3.result tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/CSVLineSerDe.java CHANGES tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/MultiBytesFieldSplitProcessor.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/CSVFile.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/CSVLineDeserializer.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/DelimitedTextFile.java tajo-core/src/test/resources/queries/TestSelectQuery/testMultiBytesDelimiter4.sql
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Tajo-master-CODEGEN-build #256 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/256/)
          TAJO-1381: Support multi-bytes delimiter for Text file (jhkim: rev 82d44af32246c63a32c049292f0a229f16e85768)

          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/TextLineDeserializer.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/CSVFile.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/DelimitedTextFile.java
          • tajo-core/src/test/resources/queries/TestSelectQuery/multibytes_delimiter_table4_ddl.sql
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/FieldSplitProcessor.java
          • tajo-core/src/test/resources/queries/TestSelectQuery/testMultiBytesDelimiter3.sql
          • tajo-core/src/test/resources/results/TestSelectQuery/testMultiBytesDelimiter4.result
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/CSVLineSerDe.java
          • tajo-core/src/test/resources/results/TestSelectQuery/testMultiBytesDelimiter3.result
          • tajo-core/src/test/resources/queries/TestSelectQuery/testMultiBytesDelimiter4.sql
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/TextLineSerDe.java
          • CHANGES
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/MultiBytesFieldSplitProcessor.java
          • tajo-storage/tajo-storage-hdfs/src/test/java/org/apache/tajo/storage/TestSplitProcessor.java
          • tajo-core/src/test/java/org/apache/tajo/engine/query/TestSelectQuery.java
          • tajo-core/src/test/resources/queries/TestSelectQuery/multibytes_delimiter_table3_ddl.sql
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/CSVLineDeserializer.java
          • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/CSVLineSerializer.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #256 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/256/ ) TAJO-1381 : Support multi-bytes delimiter for Text file (jhkim: rev 82d44af32246c63a32c049292f0a229f16e85768) tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/TextLineDeserializer.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/CSVFile.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/DelimitedTextFile.java tajo-core/src/test/resources/queries/TestSelectQuery/multibytes_delimiter_table4_ddl.sql tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/FieldSplitProcessor.java tajo-core/src/test/resources/queries/TestSelectQuery/testMultiBytesDelimiter3.sql tajo-core/src/test/resources/results/TestSelectQuery/testMultiBytesDelimiter4.result tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/CSVLineSerDe.java tajo-core/src/test/resources/results/TestSelectQuery/testMultiBytesDelimiter3.result tajo-core/src/test/resources/queries/TestSelectQuery/testMultiBytesDelimiter4.sql tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/TextLineSerDe.java CHANGES tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/MultiBytesFieldSplitProcessor.java tajo-storage/tajo-storage-hdfs/src/test/java/org/apache/tajo/storage/TestSplitProcessor.java tajo-core/src/test/java/org/apache/tajo/engine/query/TestSelectQuery.java tajo-core/src/test/resources/queries/TestSelectQuery/multibytes_delimiter_table3_ddl.sql tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/CSVLineDeserializer.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/CSVLineSerializer.java
          Hide
          hyunsik Hyunsik Choi added a comment -

          Hi Jinho Kim,

          Why don't we add this feature to 0.10.1 branch? This feature would be proper to be added a minor release.

          Show
          hyunsik Hyunsik Choi added a comment - Hi Jinho Kim , Why don't we add this feature to 0.10.1 branch? This feature would be proper to be added a minor release.
          Hide
          jhkim Jinho Kim added a comment -

          I was think that this is major feature.
          Anyway, I will merge to 0.10.1 branch

          Show
          jhkim Jinho Kim added a comment - I was think that this is major feature. Anyway, I will merge to 0.10.1 branch
          Hide
          jhkim Jinho Kim added a comment -

          committed it to branch-0.10.1

          Show
          jhkim Jinho Kim added a comment - committed it to branch-0.10.1

            People

            • Assignee:
              navis Navis
              Reporter:
              jhkim Jinho Kim
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development