Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1235

ByteBufLineReader can not read text line with CRLF

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.10.0
    • Component/s: Storage
    • Labels:
      None

      Description

      Here is the problem example

      [text .. \r]  [\n text ..]
        buffer       buffer
      
      2014-12-05 13:46:35,355 ERROR org.apache.tajo.worker.Task: maxCapacity: -1 (expected: >= 0)
      java.lang.IllegalArgumentException: maxCapacity: -1 (expected: >= 0)
      at io.netty.buffer.AbstractByteBuf.<init>(AbstractByteBuf.java:51)
      at io.netty.buffer.AbstractDerivedByteBuf.<init>(AbstractDerivedByteBuf.java:28)
      at io.netty.buffer.SlicedByteBuf.<init>(SlicedByteBuf.java:40)
      at io.netty.buffer.AbstractByteBuf.slice(AbstractByteBuf.java:938)
      at org.apache.tajo.storage.text.ByteBufLineReader.readLineBuf(ByteBufLineReader.java:168)
      at org.apache.tajo.storage.text.DelimitedLineReader.readLine(DelimitedLineReader.java:125)
      at org.apache.tajo.storage.text.DelimitedTextFile$DelimitedTextFileScanner.init(DelimitedTextFile.java:328)
      at org.apache.tajo.engine.planner.physical.SeqScanExec.initScanner(SeqScanExec.java:226)
      at org.apache.tajo.engine.planner.physical.SeqScanExec.init(SeqScanExec.java:200)
      at org.apache.tajo.engine.planner.physical.UnaryPhysicalExec.init(UnaryPhysicalExec.java:53)
      at org.apache.tajo.engine.planner.physical.UnaryPhysicalExec.init(UnaryPhysicalExec.java:53)
      at org.apache.tajo.engine.planner.physical.HashShuffleFileWriteExec.init(HashShuffleFileWriteExec.java:83)
      at org.apache.tajo.worker.Task.run(Task.java:447)
      at org.apache.tajo.worker.TaskRunner$1.run(TaskRunner.java:276)
      at java.lang.Thread.run(Thread.java:745)
      
      1. TAJO-1235_2.patch
        11 kB
        Jinho Kim
      2. TAJO-1235.patch
        11 kB
        Jinho Kim

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user jinossy opened a pull request:

        https://github.com/apache/tajo/pull/289

        TAJO-1235: ByteBufLineReader can not read text line with CRLF

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/jinossy/tajo TAJO-1235

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/289.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #289


        commit 3dbf0775b65823d544d09262f11e4544576b06b6
        Author: jhkim <jhkim@apache.org>
        Date: 2014-12-05T10:19:14Z

        TAJO-1235: ByteBufLineReader can not read text line with CRLF


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user jinossy opened a pull request: https://github.com/apache/tajo/pull/289 TAJO-1235 : ByteBufLineReader can not read text line with CRLF You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinossy/tajo TAJO-1235 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/289.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #289 commit 3dbf0775b65823d544d09262f11e4544576b06b6 Author: jhkim <jhkim@apache.org> Date: 2014-12-05T10:19:14Z TAJO-1235 : ByteBufLineReader can not read text line with CRLF
        Hide
        tajoqa Tajo QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12685304/TAJO-1235.patch
        against master revision release-0.9.0-rc0-73-gab2efce.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        -1 findbugs. The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings.

        -1 release audit. The applied patch generated 117 release audit warnings.

        -1 core tests. The patch failed these unit tests in tajo-storage:
        org.apache.tajo.storage.TestDelimitedTextFile

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/527//testReport/
        Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/527//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt
        Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/527//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-storage.html
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/527//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685304/TAJO-1235.patch against master revision release-0.9.0-rc0-73-gab2efce. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings. -1 release audit. The applied patch generated 117 release audit warnings. -1 core tests. The patch failed these unit tests in tajo-storage: org.apache.tajo.storage.TestDelimitedTextFile Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/527//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/527//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/527//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-storage.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/527//console This message is automatically generated.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/289#issuecomment-66034084

        Could you trigger unit tests?

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/289#issuecomment-66034084 Could you trigger unit tests?
        Hide
        jhkim Jinho Kim added a comment -

        I've fixed the bug. It was my bad.

        Show
        jhkim Jinho Kim added a comment - I've fixed the bug. It was my bad.
        Hide
        tajoqa Tajo QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12685728/TAJO-1235_2.patch
        against master revision release-0.9.0-rc0-77-g88e5c9e.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        -1 findbugs. The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings.

        -1 release audit. The applied patch generated 456 release audit warnings.

        +1 core tests. The patch passed unit tests in tajo-storage.

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/535//testReport/
        Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/535//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt
        Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/535//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-storage.html
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/535//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12685728/TAJO-1235_2.patch against master revision release-0.9.0-rc0-77-g88e5c9e. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings. -1 release audit. The applied patch generated 456 release audit warnings. +1 core tests. The patch passed unit tests in tajo-storage. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/535//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/535//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/535//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-storage.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/535//console This message is automatically generated.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/289#discussion_r21501490

        — Diff: tajo-storage/src/main/java/org/apache/tajo/storage/text/ByteBufLineReader.java —
        @@ -77,23 +74,25 @@ private void fillBuffer() throws IOException {

        int tailBytes = 0;
        if (this.readBytes > 0) {
        + //startIndex = 0, readIndex = tailBytes length, writable = (buffer capacity - tailBytes)
        this.buffer.markReaderIndex();

        • this.buffer.discardSomeReadBytes(); // compact the buffer
          + this.buffer.discardReadBytes(); // compact the buffer
          tailBytes = this.buffer.writerIndex();
          if (!this.buffer.isWritable()) { // a line bytes is large than the buffer - BufferPool.ensureWritable(buffer, bufferSize); + BufferPool.ensureWritable(buffer, bufferSize * 2); this.bufferSize = buffer.capacity(); }

          + this.startIndex = 0;
          }

        boolean release = true;
        try {
        int readBytes = tailBytes;
        for (; ; ) {

        • int localReadBytes = buffer.writeBytes(channel, bufferSize - readBytes);
          + int localReadBytes = buffer.writeBytes(channel, this.bufferSize - readBytes);
          if (localReadBytes < 0) {
        • if (tailBytes == readBytes) {
          + if (buffer.isWritable()) {
            • End diff –

        It seems that we can ensure EOF if there is no read bytes and there are write buffer. But, it may be hard for code readers to imagine this logic. Could you add some comments about this logic?

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/289#discussion_r21501490 — Diff: tajo-storage/src/main/java/org/apache/tajo/storage/text/ByteBufLineReader.java — @@ -77,23 +74,25 @@ private void fillBuffer() throws IOException { int tailBytes = 0; if (this.readBytes > 0) { + //startIndex = 0, readIndex = tailBytes length, writable = (buffer capacity - tailBytes) this.buffer.markReaderIndex(); this.buffer.discardSomeReadBytes(); // compact the buffer + this.buffer.discardReadBytes(); // compact the buffer tailBytes = this.buffer.writerIndex(); if (!this.buffer.isWritable()) { // a line bytes is large than the buffer - BufferPool.ensureWritable(buffer, bufferSize); + BufferPool.ensureWritable(buffer, bufferSize * 2); this.bufferSize = buffer.capacity(); } + this.startIndex = 0; } boolean release = true; try { int readBytes = tailBytes; for (; ; ) { int localReadBytes = buffer.writeBytes(channel, bufferSize - readBytes); + int localReadBytes = buffer.writeBytes(channel, this.bufferSize - readBytes); if (localReadBytes < 0) { if (tailBytes == readBytes) { + if (buffer.isWritable()) { End diff – It seems that we can ensure EOF if there is no read bytes and there are write buffer. But, it may be hard for code readers to imagine this logic. Could you add some comments about this logic?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/289#discussion_r21501517

        — Diff: tajo-storage/src/main/java/org/apache/tajo/storage/text/ByteBufLineReader.java —
        @@ -120,24 +118,34 @@ private void fillBuffer() throws IOException {

        • Read a line terminated by one of CR, LF, or CRLF.
          */
          public ByteBuf readLineBuf(AtomicInteger reads) throws IOException {
        • if(eof) return null;
          -
        • int startIndex = buffer.readerIndex();
        • int readBytes;
          + int readBytes = 0;
          + int newlineLength = 0; //length of terminating newline
          int readable;
        • int newlineLength; //length of terminating newline
          +
          + this.startIndex = buffer.readerIndex();

        loop:
        while (true) {
        readable = buffer.readableBytes();
        if (readable <= 0) {

        • buffer.readerIndex(startIndex);
          + buffer.readerIndex(this.startIndex);
          fillBuffer(); //compact and fill buffer
        • if (!buffer.isReadable()) {
          + if (!buffer.isReadable() && buffer.writerIndex() == 0) {
            • End diff –

        Could you add comments about this case?

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/289#discussion_r21501517 — Diff: tajo-storage/src/main/java/org/apache/tajo/storage/text/ByteBufLineReader.java — @@ -120,24 +118,34 @@ private void fillBuffer() throws IOException { Read a line terminated by one of CR, LF, or CRLF. */ public ByteBuf readLineBuf(AtomicInteger reads) throws IOException { if(eof) return null; - int startIndex = buffer.readerIndex(); int readBytes; + int readBytes = 0; + int newlineLength = 0; //length of terminating newline int readable; int newlineLength; //length of terminating newline + + this.startIndex = buffer.readerIndex(); loop: while (true) { readable = buffer.readableBytes(); if (readable <= 0) { buffer.readerIndex(startIndex); + buffer.readerIndex(this.startIndex); fillBuffer(); //compact and fill buffer if (!buffer.isReadable()) { + if (!buffer.isReadable() && buffer.writerIndex() == 0) { End diff – Could you add comments about this case?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/289#discussion_r21501546

        — Diff: tajo-storage/src/main/java/org/apache/tajo/storage/text/ByteBufLineReader.java —
        @@ -147,19 +155,19 @@ public ByteBuf readLineBuf(AtomicInteger reads) throws IOException {
        //does not appeared terminating newline
        buffer.readerIndex(buffer.writerIndex()); // set to end buffer
        if(eof)

        { - readBytes = buffer.readerIndex() - startIndex; - newlineLength = 0; + readBytes += buffer.readerIndex() - startIndex; break loop; }

        } else {
        buffer.readerIndex(endIndex + 1);

        • readBytes = buffer.readerIndex() - startIndex;
          + readBytes += (buffer.readerIndex() - startIndex);
          if (processor.isPrevCharCR() && buffer.isReadable()
            • End diff –

        Could you add comments about this condition?

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/289#discussion_r21501546 — Diff: tajo-storage/src/main/java/org/apache/tajo/storage/text/ByteBufLineReader.java — @@ -147,19 +155,19 @@ public ByteBuf readLineBuf(AtomicInteger reads) throws IOException { //does not appeared terminating newline buffer.readerIndex(buffer.writerIndex()); // set to end buffer if(eof) { - readBytes = buffer.readerIndex() - startIndex; - newlineLength = 0; + readBytes += buffer.readerIndex() - startIndex; break loop; } } else { buffer.readerIndex(endIndex + 1); readBytes = buffer.readerIndex() - startIndex; + readBytes += (buffer.readerIndex() - startIndex); if (processor.isPrevCharCR() && buffer.isReadable() End diff – Could you add comments about this condition?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/289#issuecomment-66219912

        The patch looks good to me. It's a nice fix. It would be great if you add some comments about if-condition branches.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/289#issuecomment-66219912 The patch looks good to me. It's a nice fix. It would be great if you add some comments about if-condition branches.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on the pull request:

        https://github.com/apache/tajo/pull/289#issuecomment-66225661

        Thank you for the review. I've left some comments

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/289#issuecomment-66225661 Thank you for the review. I've left some comments
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/289#issuecomment-66256498

        +1
        The patch looks good to me.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/289#issuecomment-66256498 +1 The patch looks good to me.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/289

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/289
        Hide
        jhkim Jinho Kim added a comment -

        committed it

        Show
        jhkim Jinho Kim added a comment - committed it
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-master-CODEGEN-build #133 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/133/)
        TAJO-1235: ByteBufLineReader can not read text line with CRLF. (jinho) (jhkim: rev bebc78011798cd2fd691e8901141e94f1d445d6e)

        • tajo-storage/src/main/java/org/apache/tajo/storage/text/DelimitedLineReader.java
        • tajo-storage/src/test/java/org/apache/tajo/storage/TestLineReader.java
        • CHANGES
        • tajo-storage/src/main/java/org/apache/tajo/storage/text/ByteBufLineReader.java
        • tajo-storage/src/main/java/org/apache/tajo/storage/ByteBufInputChannel.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #133 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/133/ ) TAJO-1235 : ByteBufLineReader can not read text line with CRLF. (jinho) (jhkim: rev bebc78011798cd2fd691e8901141e94f1d445d6e) tajo-storage/src/main/java/org/apache/tajo/storage/text/DelimitedLineReader.java tajo-storage/src/test/java/org/apache/tajo/storage/TestLineReader.java CHANGES tajo-storage/src/main/java/org/apache/tajo/storage/text/ByteBufLineReader.java tajo-storage/src/main/java/org/apache/tajo/storage/ByteBufInputChannel.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #492 (See https://builds.apache.org/job/Tajo-master-build/492/)
        TAJO-1235: ByteBufLineReader can not read text line with CRLF. (jinho) (jhkim: rev bebc78011798cd2fd691e8901141e94f1d445d6e)

        • tajo-storage/src/main/java/org/apache/tajo/storage/text/DelimitedLineReader.java
        • tajo-storage/src/main/java/org/apache/tajo/storage/ByteBufInputChannel.java
        • tajo-storage/src/test/java/org/apache/tajo/storage/TestLineReader.java
        • CHANGES
        • tajo-storage/src/main/java/org/apache/tajo/storage/text/ByteBufLineReader.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #492 (See https://builds.apache.org/job/Tajo-master-build/492/ ) TAJO-1235 : ByteBufLineReader can not read text line with CRLF. (jinho) (jhkim: rev bebc78011798cd2fd691e8901141e94f1d445d6e) tajo-storage/src/main/java/org/apache/tajo/storage/text/DelimitedLineReader.java tajo-storage/src/main/java/org/apache/tajo/storage/ByteBufInputChannel.java tajo-storage/src/test/java/org/apache/tajo/storage/TestLineReader.java CHANGES tajo-storage/src/main/java/org/apache/tajo/storage/text/ByteBufLineReader.java

          People

          • Assignee:
            jhkim Jinho Kim
            Reporter:
            jhkim Jinho Kim
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development