Uploaded image for project: 'Camel'
  1. Camel
  2. CAMEL-12698

Unmarshaling a CSV file with the NEL (next line) character will cause Bindy to misread the entire file

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.22.0
    • Fix Version/s: 2.23.0
    • Component/s: camel-bindy
    • Labels:
      None
    • Estimated Complexity:
      Unknown

      Description

      I am using Apache Camel to process a lot of large CSV files, and relying on Bindy to assist with unmarshalling them into POJOs.

      We have an upstream data bug which causes a record of ours to contain the Unicode character NEL, but while we're working through the cause of that, I found it curious as to what Bindy is actually doing with it.  We rely on the unmarshal process to perform a batch insert, and because our POJO is missing certain fields, we started observing that the 

      Bindy is relying on Scanner to read lines in a large file; however, Scanner itself also does some parsing of the line with the assumption that, if it sees the NEL character, it will regard it as a newline character.  The modern Files API does not make this distinction and reads to a newline designation only (e.g \n, \r, or \r\n).

      There are two ways to fix this from what I've been able to smoke test:

      • Change the Scanner implementation to use a delimeter of the more traditional newline characters
      • Use Java 8's Files API and stream the file in

      I would personally want to use the Files API to handle this since it's more robust and capable of higher performance, but I'll explore both approaches and see where I end up.

       

        Attachments

          Activity

            People

            • Assignee:
              onders Onder Sezgin
              Reporter:
              MakotoTheKnight Jason Black
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: