Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5771

DelimitedInputFormat does not correctly handle multi-byte delimiters

    Details

      Description

      The DelimitedInputFormat does not correctly handle multi-byte delimiters.

      The reader sometimes misses a delimiter if it is preceded by the first byte from the delimiter. This results in two records (or more) being returned from a single call to nextRecord.

      See attached test case.

      1. Test.java
        1 kB
        Colin Breame
      2. test.txt
        0.0 kB
        Colin Breame

        Issue Links

          Activity

          Hide
          colinbreame Colin Breame added a comment -

          The problem is in the loop found at line 570 of DelimitedInputFormat.java:

          			while (this.readPos < this.limit && i < this.delimiter.length) {
          				if ((this.readBuffer[this.readPos++]) == this.delimiter[i]) {
          					i++;
          				} else {
          					i = 0;
          				}
          			}
          
          Show
          colinbreame Colin Breame added a comment - The problem is in the loop found at line 570 of DelimitedInputFormat.java: while ( this .readPos < this .limit && i < this .delimiter.length) { if (( this .readBuffer[ this .readPos++]) == this .delimiter[i]) { i++; } else { i = 0; } }
          Hide
          fhueske Fabian Hueske added a comment -

          Thanks for reporting this issue Colin Breame.
          I'll look into it.

          Show
          fhueske Fabian Hueske added a comment - Thanks for reporting this issue Colin Breame . I'll look into it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user fhueske opened a pull request:

          https://github.com/apache/flink/pull/3316

          FLINK-5771 [core] Fix multi-char delimiter detection in DelimitedInputFormat

          Fix multi-char delimiter detection in DelimitedInputFormat.

          • Add a test case to validate correct delimiter detection.
          • Remove a couple of try-catch blocks from existing tests.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/fhueske/flink delIFFix

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3316.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3316


          commit d1edaec0310bb10948d8080c1fffde7e716e0d7f
          Author: Fabian Hueske <fhueske@apache.org>
          Date: 2017-02-14T21:02:26Z

          FLINK-5771 [core] Fix multi-char delimiter detection in DelimitedInputFormat.

          • Add a test case to validate correct delimiter detection.
          • Remove a couple of try-catch blocks from existing tests.

          Show
          githubbot ASF GitHub Bot added a comment - GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/3316 FLINK-5771 [core] Fix multi-char delimiter detection in DelimitedInputFormat Fix multi-char delimiter detection in DelimitedInputFormat. Add a test case to validate correct delimiter detection. Remove a couple of try-catch blocks from existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink delIFFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3316.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3316 commit d1edaec0310bb10948d8080c1fffde7e716e0d7f Author: Fabian Hueske <fhueske@apache.org> Date: 2017-02-14T21:02:26Z FLINK-5771 [core] Fix multi-char delimiter detection in DelimitedInputFormat. Add a test case to validate correct delimiter detection. Remove a couple of try-catch blocks from existing tests.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3316

          Very good fix!

          +1 to merge

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3316 Very good fix! +1 to merge
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user fhueske commented on the issue:

          https://github.com/apache/flink/pull/3316

          merging

          Show
          githubbot ASF GitHub Bot added a comment - Github user fhueske commented on the issue: https://github.com/apache/flink/pull/3316 merging
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3316

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3316
          Hide
          fhueske Fabian Hueske added a comment -

          Fixed for 1.3.0 with d6a97e480e294e4779eb320a8b57983122a6cf63
          Fixed for 1.2.1 with 3b4f6cf8c8283c221c8ab58f544cfe01c092fe6b
          Fixed for 1.1.5 with 44f48b34e8be1810266e35d0c392d22494b098ba

          Show
          fhueske Fabian Hueske added a comment - Fixed for 1.3.0 with d6a97e480e294e4779eb320a8b57983122a6cf63 Fixed for 1.2.1 with 3b4f6cf8c8283c221c8ab58f544cfe01c092fe6b Fixed for 1.1.5 with 44f48b34e8be1810266e35d0c392d22494b098ba

            People

            • Assignee:
              fhueske Fabian Hueske
              Reporter:
              colinbreame Colin Breame
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development