Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14384

If fsync fails it's always an issue and continuing execution is suspect

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Normal

    Description

      We can't catch fsync errors and continue so we shouldn't have code that does that in C*. There was a Postgres bug where fsync returned an error and the FS lost data, but subsequent fsyncs succeeded.

      The LastErrorException code in NativeLibrary.trySync looks a little janky. What's up with that? When would trySync be something we would merely try? If try is good enough why do it at all considering try is the default behavior of a series of unsynced filesystem operations.

      Also when we fsync in FD it's not just fsyncing that file the FS is potentially fsyncing other data and the error code we get could be related to that other data so we can't safely ignore it. The filesystem could be internally inconsistent as well. This happens because the FS journaling may force the FS to flush other data as well to preserve the ordering requirements of journaled metadata.

      If we ignore fsync errors it needs to be for whitelisted reasons such as a bad FD.

      I know we have FSErrorHandler and it makes sense for reads, but I'm not sold on it being the right answer for writes. We don't retry flushing a memtable or writing to the commit log to my knowledge. We could go read only and I need to check if that is w

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            aweisberg Ariel Weisberg Assign to me
            aweisberg Ariel Weisberg
            Ariel Weisberg
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment