Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-122

Losing events due to unlikely race in Fork::processRecords()?

    XMLWordPrintableJSON

Details

    Description

      It looks like there is an unlikely race in the metioned method in a situation where the extractor would stop pulling records for some time triggering the fork runner to time out on the queue, then publish some records and then calling markParentTaskDone before the forkrunner checks it?

      Isn't something else cleaner like publishing a done message into the queue or something? Am I missing something that makes it not racy?

      I am really looking forward to migrate away from Camus, Gobblin looks really really good thank you for all this great software. I am actually quite tired right now. I hope there are not to many more typos

      Github Url : https://github.com/linkedin/gobblin/issues/840
      Github Reporter : Kaiserchen
      Github Created At : 2016-03-11T22:13:17Z
      Github Updated At : 2017-01-12T04:50:01Z

      Comments


      stakiar wrote on 2016-03-12T01:27:48Z : hey @Kaiserchen

      Yeah that seems like a possible race condition, in the `Fork.processRecords` method. To provide more details here a list of steps as to how this could happen:

      • Task Thread launches Fork Thread that listens for records on the `recordQueue` and processes them
      • Task Thread extracts 1 record from the `Extractor`
      • Task Thread puts the record on the `recordQueue`
      • Fork Thread processes the record and writes it, and then invokes `recordQueue.get()` to wait for the next record
      • Task Thread tries to extract a record, but the extractor stalls waiting for an I/O event
      • Fork Thread's `recordQueue.get()` method times out, and the operation returns null
      • Task Thread finally gets the extractor to return a record, the record is put on the queue
      • Task Thread's Extractor returns null, and it tells the Fork there are no more records coming
      • Fork Thread then looks at the record of `recordQueue.get` and see that the result is `null` because the `get()` timed out, it then sees that the `Task` is also done processing all records, so it thinks there is no more work to be done and then it returns

      In this case the last record will never get written (resulting in missing data) and there will be no failure of any kind.

      Github Url : https://github.com/linkedin/gobblin/issues/840#issuecomment-195624864


      Kaiserchen wrote on 2016-03-12T05:28:11Z : The Io event could potentially block fetching many records that get cached in the extractor so it could start a burst squeezing even more events in, making it more unlikly that it makes it ways to set the done flag, but no gurantees that even many records might be missed. Shall i prepare a patch where the mark done flag puts a static object in as record and in the process i check if it was the done record? also would make the done notification immidiate ( without waiting).

      Github Url : https://github.com/linkedin/gobblin/issues/840#issuecomment-195662645


      Kaiserchen wrote on 2016-03-14T13:11:22Z : I submitted a pull request #841 maybe someone can take a look?

      Github Url : https://github.com/linkedin/gobblin/issues/840#issuecomment-196304101


      stakiar wrote on 2016-03-14T20:53:53Z : @Kaiserchen yes this approach looks good, thanks for the PR. I just made a few minor comments. Overall it looks good.

      Github Url : https://github.com/linkedin/gobblin/issues/840#issuecomment-196517130


      Kaiserchen wrote on 2016-04-13T09:27:50Z : See comment in #841. I would be glad if someone else picks this up. I don't have the time for that currently.

      Github Url : https://github.com/linkedin/gobblin/issues/840#issuecomment-209332889

      Attachments

        Activity

          People

            Unassigned Unassigned
            abti Abhishek Tiwari
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: