Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-3495

TextLineDemarcator sets the wrong index when read ahead is performed in isEol operation

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 1.2.0
    • None
    • None

    Description

      This condition is very rare. It only occurs when read ahead (call to fill()) is made inside of the isEol operation which essentially sets the new index which then is reset inside of the main nextOffsetInfo operation.
      So the fix is to basically monitor if isEol had to perform read ahead and if it did do not reset the index.

      More details.
      While this component is modeled after standard Java BufferedReader which simply reads and returns lines (delimited by CR or LF or both), this reader also holds the information about how each line terminated (i.e., EOF, or CR or LF or CR and LF) returning it to the caller as OffsetInfo.
      So for example if you have a record "foo\r\nbar" and you read it with BuffereReader you will get 'foo' and 'bar'. However you will not know that between the two tokens there was CR and LF and therefore will not be able to restore (if need to) the record to its original state. The TextLineDemarcator will return OffsetInfo which holds the delimiter and other information.

      So, to accomplish the above every time we see CR (13) we need to peek at the next byte and see if its LF(10). When at the end of the buffer such peek becomes complicated since we need to read more data and so we did, but didn't handle index properly essentially setting it back to the old value when the new one was set inside of the fill().

      Attachments

        Issue Links

          Activity

            People

              ozhurakousky Oleg Zhurakousky
              ozhurakousky Oleg Zhurakousky
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: