Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1486

Text file should support to skip header rows when creating external table

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.11.0
    • Component/s: Storage
    • Labels:
      None

      Description

      It is quite common to see header/footer lines in real world data set. So skipping first/last N lines in "create external table" DDL can be useful feature for Tajo users. In this way, user don't need additional processing of data which generated by other application with a header or footer and directly use the file for table operations.

      cf. Same feature added in Hive 0.13 : https://issues.apache.org/jira/browse/HIVE-5795

      1. TAJO-1486.patch
        7 kB
        Jongyoung Park
      2. TAJO-1486-1.patch
        7 kB
        Jongyoung Park

        Activity

        Hide
        eminency Jongyoung Park added a comment -

        Two options are added.

        text.skip.headerlines
        text.skip.footerlines

        This feature is only for delimited text files.

        Show
        eminency Jongyoung Park added a comment - Two options are added. text.skip.headerlines text.skip.footerlines This feature is only for delimited text files.
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user eminency opened a pull request:

        https://github.com/apache/tajo/pull/611

        TAJO-1486: Tajo should be able to skip header and footer rows when creating external table

        Two table meta options are added .

        > text.skip.headerlines
        > text.skip.footerlines

        This feature is only for delimited text files.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/eminency/tajo TAJO-1486

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/611.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #611


        commit b607f176a705326cdacd4b8969ac911957769e62
        Author: Jongyoung Park <eminency@gmail.com>
        Date: 2015-06-24T09:46:40Z

        Added skipping header/footer lines feature for text files


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user eminency opened a pull request: https://github.com/apache/tajo/pull/611 TAJO-1486 : Tajo should be able to skip header and footer rows when creating external table Two table meta options are added . > text.skip.headerlines > text.skip.footerlines This feature is only for delimited text files. You can merge this pull request into a Git repository by running: $ git pull https://github.com/eminency/tajo TAJO-1486 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/611.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #611 commit b607f176a705326cdacd4b8969ac911957769e62 Author: Jongyoung Park <eminency@gmail.com> Date: 2015-06-24T09:46:40Z Added skipping header/footer lines feature for text files
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency closed the pull request at:

        https://github.com/apache/tajo/pull/611

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency closed the pull request at: https://github.com/apache/tajo/pull/611
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user eminency opened a pull request:

        https://github.com/apache/tajo/pull/615

        TAJO-1486: Tajo should be able to skip header and footer rows when creating external table

        Two table meta options are added .

        > text.skip.headerlines
        > text.skip.footerlines

        This feature is only for delimited text files.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/eminency/tajo TAJO-1486

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/615.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #615


        commit 6d415dd31417ab0d96cd01d9559dacef8b983ea2
        Author: Jongyoung Park <eminency@gmail.com>
        Date: 2015-06-24T09:46:40Z

        Added skipping header/footer lines feature for text files


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user eminency opened a pull request: https://github.com/apache/tajo/pull/615 TAJO-1486 : Tajo should be able to skip header and footer rows when creating external table Two table meta options are added . > text.skip.headerlines > text.skip.footerlines This feature is only for delimited text files. You can merge this pull request into a Git repository by running: $ git pull https://github.com/eminency/tajo TAJO-1486 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/615.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #615 commit 6d415dd31417ab0d96cd01d9559dacef8b983ea2 Author: Jongyoung Park <eminency@gmail.com> Date: 2015-06-24T09:46:40Z Added skipping header/footer lines feature for text files
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on the pull request:

        https://github.com/apache/tajo/pull/615#issuecomment-116404375

        @eminency

        Thanks for your contribution.
        If there is a unit test case for a plain text file, it would be more better.

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/615#issuecomment-116404375 @eminency Thanks for your contribution. If there is a unit test case for a plain text file, it would be more better.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on the pull request:

        https://github.com/apache/tajo/pull/615#issuecomment-116432329

        @blrunner
        I see, I will try to add it.

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on the pull request: https://github.com/apache/tajo/pull/615#issuecomment-116432329 @blrunner I see, I will try to add it.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/615#discussion_r34979963

        — Diff: tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/ByteBufLineReader.java —
        @@ -197,6 +197,6 @@ public ByteBuf readLineBuf(AtomicInteger reads) throws IOException {
        }
        }
        reads.set(readBytes);

        • return buffer.slice(startIndex, readBytes - newlineLength);
          + return buffer.slice(startIndex, readBytes - newlineLength).retain();
            • End diff –

        This buffer is shared until closing the ByteBufLineReader. if you want to keep the sliced buffer, you must copy to new buffer

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/615#discussion_r34979963 — Diff: tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/ByteBufLineReader.java — @@ -197,6 +197,6 @@ public ByteBuf readLineBuf(AtomicInteger reads) throws IOException { } } reads.set(readBytes); return buffer.slice(startIndex, readBytes - newlineLength); + return buffer.slice(startIndex, readBytes - newlineLength).retain(); End diff – This buffer is shared until closing the ByteBufLineReader. if you want to keep the sliced buffer, you must copy to new buffer
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on the pull request:

        https://github.com/apache/tajo/pull/615#issuecomment-122827736

        Guys,
        In my opinion, skipping the footer would be good after block interation is added

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/615#issuecomment-122827736 Guys, In my opinion, skipping the footer would be good after block interation is added
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on the pull request:

        https://github.com/apache/tajo/pull/615#issuecomment-123550425

        +1 Looks great to me
        I will move the headerLineNum to local variable before commit it

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/615#issuecomment-123550425 +1 Looks great to me I will move the headerLineNum to local variable before commit it
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on the pull request:

        https://github.com/apache/tajo/pull/615#issuecomment-123563124

        @jinossy It looks good

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on the pull request: https://github.com/apache/tajo/pull/615#issuecomment-123563124 @jinossy It looks good
        Hide
        jhkim Jinho Kim added a comment -

        committed it.
        Skipping footer will be support after batch tuple is added.
        Thank your for your contribution!

        Show
        jhkim Jinho Kim added a comment - committed it. Skipping footer will be support after batch tuple is added. Thank your for your contribution!
        Hide
        hudson Hudson added a comment -

        ABORTED: Integrated in Tajo-master-CODEGEN-build #401 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/401/)
        TAJO-1486: Text file should support to skip header rows when creating external table. (Contributed by Jongyoung Park. Committed by jinho) (jhkim: rev e5b30e542a409ec0378a787c76f6387fd3ca84a9)

        • tajo-storage/tajo-storage-hdfs/src/test/java/org/apache/tajo/storage/TestDelimitedTextFile.java
        • tajo-common/src/main/java/org/apache/tajo/storage/StorageConstants.java
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/DelimitedTextFile.java
        • tajo-docs/src/main/sphinx/table_management/text.rst
        • tajo-storage/tajo-storage-hdfs/src/test/resources/dataset/TestDelimitedTextFile/testNormal.json
        • CHANGES
        • tajo-storage/tajo-storage-hdfs/src/test/resources/dataset/TestDelimitedTextFile/testSkip.txt
        Show
        hudson Hudson added a comment - ABORTED: Integrated in Tajo-master-CODEGEN-build #401 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/401/ ) TAJO-1486 : Text file should support to skip header rows when creating external table. (Contributed by Jongyoung Park. Committed by jinho) (jhkim: rev e5b30e542a409ec0378a787c76f6387fd3ca84a9) tajo-storage/tajo-storage-hdfs/src/test/java/org/apache/tajo/storage/TestDelimitedTextFile.java tajo-common/src/main/java/org/apache/tajo/storage/StorageConstants.java tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/DelimitedTextFile.java tajo-docs/src/main/sphinx/table_management/text.rst tajo-storage/tajo-storage-hdfs/src/test/resources/dataset/TestDelimitedTextFile/testNormal.json CHANGES tajo-storage/tajo-storage-hdfs/src/test/resources/dataset/TestDelimitedTextFile/testSkip.txt
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #761 (See https://builds.apache.org/job/Tajo-master-build/761/)
        TAJO-1486: Text file should support to skip header rows when creating external table. (Contributed by Jongyoung Park. Committed by jinho) (jhkim: rev e5b30e542a409ec0378a787c76f6387fd3ca84a9)

        • tajo-storage/tajo-storage-hdfs/src/test/java/org/apache/tajo/storage/TestDelimitedTextFile.java
        • CHANGES
        • tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/DelimitedTextFile.java
        • tajo-storage/tajo-storage-hdfs/src/test/resources/dataset/TestDelimitedTextFile/testSkip.txt
        • tajo-docs/src/main/sphinx/table_management/text.rst
        • tajo-storage/tajo-storage-hdfs/src/test/resources/dataset/TestDelimitedTextFile/testNormal.json
        • tajo-common/src/main/java/org/apache/tajo/storage/StorageConstants.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #761 (See https://builds.apache.org/job/Tajo-master-build/761/ ) TAJO-1486 : Text file should support to skip header rows when creating external table. (Contributed by Jongyoung Park. Committed by jinho) (jhkim: rev e5b30e542a409ec0378a787c76f6387fd3ca84a9) tajo-storage/tajo-storage-hdfs/src/test/java/org/apache/tajo/storage/TestDelimitedTextFile.java CHANGES tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/text/DelimitedTextFile.java tajo-storage/tajo-storage-hdfs/src/test/resources/dataset/TestDelimitedTextFile/testSkip.txt tajo-docs/src/main/sphinx/table_management/text.rst tajo-storage/tajo-storage-hdfs/src/test/resources/dataset/TestDelimitedTextFile/testNormal.json tajo-common/src/main/java/org/apache/tajo/storage/StorageConstants.java
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency closed the pull request at:

        https://github.com/apache/tajo/pull/615

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency closed the pull request at: https://github.com/apache/tajo/pull/615

          People

          • Assignee:
            eminency Jongyoung Park
            Reporter:
            ykko Youngkyong Ko
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development