Uploaded image for project: 'Camel'
  1. Camel
  2. CAMEL-12698

Unmarshaling a CSV file with the NEL (next line) character will cause Bindy to misread the entire file

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.22.0
    • 2.23.0
    • camel-bindy
    • None
    • Unknown

    Description

      I am using Apache Camel to process a lot of large CSV files, and relying on Bindy to assist with unmarshalling them into POJOs.

      We have an upstream data bug which causes a record of ours to contain the Unicode character NEL, but while we're working through the cause of that, I found it curious as to what Bindy is actually doing with it.  We rely on the unmarshal process to perform a batch insert, and because our POJO is missing certain fields, we started observing that the 

      Bindy is relying on Scanner to read lines in a large file; however, Scanner itself also does some parsing of the line with the assumption that, if it sees the NEL character, it will regard it as a newline character.  The modern Files API does not make this distinction and reads to a newline designation only (e.g \n, \r, or \r\n).

      There are two ways to fix this from what I've been able to smoke test:

      • Change the Scanner implementation to use a delimeter of the more traditional newline characters
      • Use Java 8's Files API and stream the file in

      I would personally want to use the Files API to handle this since it's more robust and capable of higher performance, but I'll explore both approaches and see where I end up.

       

      Attachments

        Activity

          People

            onders Onder Sezgin
            MakotoTheKnight Jason Black
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: