Cassandra
  1. Cassandra
  2. CASSANDRA-2116

Separate out filesystem errors from generic IOErrors

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.0 beta 1
    • Component/s: Core
    • Labels:
      None

      Description

      We throw IOErrors everywhere today in the codebase. We should separate out specific errors such as (reading, writing) from filesystem into FSReadError and FSWriteError. This makes it possible in the next ticket to allow certain failure modes (kill the server if reads or writes fail to disk).

      1. CASSANDRA-2116-v6.patch
        251 kB
        Aleksey Yeschenko
      2. CASSANDRA-2116-v4.patch
        162 kB
        Aleksey Yeschenko
      3. CASSANDRA-2116-v3.patch
        103 kB
        Aleksey Yeschenko
      4. CASSANDRA-2116.2.patch
        23 kB
        Aleksey Yeschenko
      5. 2116-v5.txt
        182 kB
        Jonathan Ellis
      6. 0001-Separate-out-filesystem-errors-from-generic-IOErrors.patch
        35 kB
        Chris Goffinet
      7. 0001-Issue-2116-Replace-some-IOErrors-with-more-informati.patch
        85 kB
        Aleksey Yeschenko

        Activity

        Hide
        Chris Goffinet added a comment -

        2 new classes: FSWriteError and FSReadError.

        Show
        Chris Goffinet added a comment - 2 new classes: FSWriteError and FSReadError.
        Hide
        Jonathan Ellis added a comment -

        I'm not sure having different classes for read/write errors is necessary (code that is in a position to catch-and-do-something-reasonable knows what kind of op it's attempting). On the other hand, if a write op does a read as part of its implementation (indexes cause this to happen) we might need to distinguish the two.

        I think it's more useful to distinguish between recoverable errors and non-: "I got EOF earlier than I thought" usually means the file is corrupt not the disk is dead. (I can't think of any read errors that absolutely mean disk-is-dead.)

        It would be useful to get some use out of Java's misguided checked exceptions, by keeping recoverable errors checked (IOException) and unrecoverable ones unchecked (IOError).

        Show
        Jonathan Ellis added a comment - I'm not sure having different classes for read/write errors is necessary (code that is in a position to catch-and-do-something-reasonable knows what kind of op it's attempting). On the other hand, if a write op does a read as part of its implementation (indexes cause this to happen) we might need to distinguish the two. I think it's more useful to distinguish between recoverable errors and non-: "I got EOF earlier than I thought" usually means the file is corrupt not the disk is dead. (I can't think of any read errors that absolutely mean disk-is-dead.) It would be useful to get some use out of Java's misguided checked exceptions, by keeping recoverable errors checked (IOException) and unrecoverable ones unchecked (IOError).
        Hide
        Chris Goffinet added a comment -

        Unfortunately the best we can get is IOError from Java. For example we use this patch to actually detect when our raid array dies, the OS will tell java to throw IOError. I think we should error on the side of, if data is corrupt, we should let the operator decide what mode he wants. For us, any errors or any corruption of data, we want to take the node out right away.

        We have been testing this in production for awhile and it works really well when disks die, and we also did tests involving removing drives from the system while it was serving traffic.

        The Read/Write classes was a similar idea of how the Hadoop code base handles this very issue.

        Show
        Chris Goffinet added a comment - Unfortunately the best we can get is IOError from Java. For example we use this patch to actually detect when our raid array dies, the OS will tell java to throw IOError. I think we should error on the side of, if data is corrupt, we should let the operator decide what mode he wants. For us, any errors or any corruption of data, we want to take the node out right away. We have been testing this in production for awhile and it works really well when disks die, and we also did tests involving removing drives from the system while it was serving traffic. The Read/Write classes was a similar idea of how the Hadoop code base handles this very issue.
        Hide
        Jonathan Ellis added a comment -

        I think you're probably right. Ideally what I'd like is something like this:

        • Code that knows what disk is involved throws an FSReadError/FSWriteError that records the disk/volume in question
        • Notably this will not necessarily be the lowest-level code, which often just takes a DataInput or DataOutput interface. We'll want to declare the checked IOException at that level to make sure we handle it higher up
        • Our global uncaught exception handler can mark the disks in question bad (exactly where CASSANDRA-2118 looks to terminate things; could handle this over there to go with the existing division of code)
        • As for the request that hit the exception in the first place, we just allow that to fail + timeout normally; little benefit to be had by complicating that further (which is the approach taken here and 2118; I just note it to be explicit)
        Show
        Jonathan Ellis added a comment - I think you're probably right. Ideally what I'd like is something like this: Code that knows what disk is involved throws an FSReadError/FSWriteError that records the disk/volume in question Notably this will not necessarily be the lowest-level code, which often just takes a DataInput or DataOutput interface. We'll want to declare the checked IOException at that level to make sure we handle it higher up Our global uncaught exception handler can mark the disks in question bad (exactly where CASSANDRA-2118 looks to terminate things; could handle this over there to go with the existing division of code) As for the request that hit the exception in the first place, we just allow that to fail + timeout normally; little benefit to be had by complicating that further (which is the approach taken here and 2118; I just note it to be explicit)
        Hide
        Aleksey Yeschenko added a comment -

        This patch replaces some of the IOErrors with CommitLogWriteErrors or directory-aware FSReadErrors and FSWriteErrors. These will be used in #2118 to mark all keyspaces with durable_writes=true as non-writable (in case of CommitLogWriteError) or to mark affected (directory, keyspace, columnfamily) triples as non-writable (FSWriteError) or non-readable AND non-writable (FSReadError).
        Some other IOErrors might be replaced during #2118.

        Show
        Aleksey Yeschenko added a comment - This patch replaces some of the IOErrors with CommitLogWriteErrors or directory-aware FSReadErrors and FSWriteErrors. These will be used in #2118 to mark all keyspaces with durable_writes=true as non-writable (in case of CommitLogWriteError) or to mark affected (directory, keyspace, columnfamily) triples as non-writable (FSWriteError) or non-readable AND non-writable (FSReadError). Some other IOErrors might be replaced during #2118.
        Hide
        Jonathan Ellis added a comment -

        Thanks, Aleksey. This looks like a good start!

        As a rule of thumb, my expectation would be that most catch (IOException) block should either be removed (and just declare that method to throw IOE) or transform to FSE. Of course there will be exceptions, e.g. IOException from a socket instead of local disk. But a quick inspection shows a lot of places where this still needs to be done.

        Nit: should CLWE subclass FSError?

        Show
        Jonathan Ellis added a comment - Thanks, Aleksey. This looks like a good start! As a rule of thumb, my expectation would be that most catch (IOException) block should either be removed (and just declare that method to throw IOE) or transform to FSE. Of course there will be exceptions, e.g. IOException from a socket instead of local disk. But a quick inspection shows a lot of places where this still needs to be done. Nit: should CLWE subclass FSError?
        Hide
        Jonathan Ellis added a comment -

        Iterators are screwing us. Ideally I'd like to have things like BSF.getSegment just throw IOException back up until someone who knows what sstable it belongs to, can deal with it, but since Iterator.next can't throw anything [checked] that's not an option.

        So I think what we need to do is go back to FSReadError / FSWriteError, with just a path to the file in question, and let the exception handler deal with parsing out directory + ks + cf from that.

        This has the added benefit of being able to handle non-sstable i/o (cache saving, commitlogs) without needing extra exception types.

        Show
        Jonathan Ellis added a comment - Iterators are screwing us. Ideally I'd like to have things like BSF.getSegment just throw IOException back up until someone who knows what sstable it belongs to, can deal with it, but since Iterator.next can't throw anything [checked] that's not an option. So I think what we need to do is go back to FSReadError / FSWriteError, with just a path to the file in question, and let the exception handler deal with parsing out directory + ks + cf from that. This has the added benefit of being able to handle non-sstable i/o (cache saving, commitlogs) without needing extra exception types.
        Hide
        Pavel Yaskevich added a comment - - edited

        +1 with Jonathan, that would also allow as us to throw less RTE exceptions like, for example, in SystemTable or ReadRepairVerbHandler (associated with RowMutation.apply() calls). I also think that we should keep length() method from RandomAccessReader throwing IOException, as RAF does originally, because I don't think we want to keep future extenders from using custom (potentially I/O based) length tracking. Nit: license header is missing from all of the new files.

        Edit: SerializationError seems to be a good idea for the situations where action is not actually FS related, SerializingCache for one.

        Show
        Pavel Yaskevich added a comment - - edited +1 with Jonathan, that would also allow as us to throw less RTE exceptions like, for example, in SystemTable or ReadRepairVerbHandler (associated with RowMutation.apply() calls). I also think that we should keep length() method from RandomAccessReader throwing IOException, as RAF does originally, because I don't think we want to keep future extenders from using custom (potentially I/O based) length tracking. Nit: license header is missing from all of the new files. Edit: SerializationError seems to be a good idea for the situations where action is not actually FS related, SerializingCache for one.
        Hide
        Jonathan Ellis added a comment -

        SerializationError seems to be a good idea for the situations where action is not actually FS related

        I'm fine leaving these as RTE since we can't do anything reasonable to recover from them either way.

        Show
        Jonathan Ellis added a comment - SerializationError seems to be a good idea for the situations where action is not actually FS related I'm fine leaving these as RTE since we can't do anything reasonable to recover from them either way.
        Hide
        Aleksey Yeschenko added a comment -

        This should do it.

        Show
        Aleksey Yeschenko added a comment - This should do it.
        Hide
        Jonathan Ellis added a comment -

        DD.createAllDirectories will stop trying to create as soon as the first directory fails, so it's not going to be appropriate for generic FSWriteError handling. Suggest logging an error and explicitly shutting down instead. (This should only be called on startup.)

        Looks like we should drop the "throws IOException" declaration from applyIndexUpdates (and have that chain throw FSWE as needed).

        BatchCommitLogExecutorService.processWithSyncBatch should throw FSWE instead of RTE.

        CommitLogSegment.sync should turn IOE into FSWE. Rest of sync heirarchy won't need throws declaration.

        Note for CASSANDRA-2118: will need to unwrap exceptions looking for FSWE since CLES/PCLES can wrap in ExecutionException. (Others might as well. Easier to do unwrap check in 2118 than to audit all possible executors.) On the other hand, this makes trying to catch the error before it hits the exception hook more of a pain, as in the next item...

        CollationController needs to retain its try/catch, since we want to allow the read to succeed, even if the defragmenting write fails. Since it could error w/ either FSWE or EE (from the commitlog add), probably need to catch generic Exception. For 2118 we can add some way to submit this to the disk blacklister without re-throwing.

        Looks like it would be worth adding a constructor for FSRW taking a Descriptor.

        SSTR.createLinks should throw FSWE.

        Methods called by SSTW constructor should throw FSWE.

        SSTW methods should throw FSWE. (callers of append will want to catch + re-throw after cleanup.)

        TruncateVerbHandler (and anyone else) shouldn't swallow potential FSWE by logging, need to rethrow. (FBUtilities.unchecked is handy in such cases.)

        I agree with your use of AssertionError in LCR. Would prefer to use RTE in SSTableReader though, since we do some tricky reference counting around that and I wouldn't want to ignore problems there b/c someone turned off assertions. (Surprisingly common...)

        SSTII should throw IOException when it doesn't know what DataInput is. Callers can transform to FSRE. (Other constructors, or in the last case, IncomingStreamReader.)

        Corrupt sstables (sstablescanner + others?) shouldn't be turned into FSRE, since it's usually bad memory or a bug and not the disk's fault.

        FileUtils should throw FSWE.

        BTW: congratulations on getting import ordering (almost) correct on the first try. The only thing missing is, com.google.common goes above org.slf4j instead of being lumped in with "everything else."

        Show
        Jonathan Ellis added a comment - DD.createAllDirectories will stop trying to create as soon as the first directory fails, so it's not going to be appropriate for generic FSWriteError handling. Suggest logging an error and explicitly shutting down instead. (This should only be called on startup.) Looks like we should drop the "throws IOException" declaration from applyIndexUpdates (and have that chain throw FSWE as needed). BatchCommitLogExecutorService.processWithSyncBatch should throw FSWE instead of RTE. CommitLogSegment.sync should turn IOE into FSWE. Rest of sync heirarchy won't need throws declaration. Note for CASSANDRA-2118 : will need to unwrap exceptions looking for FSWE since CLES/PCLES can wrap in ExecutionException. (Others might as well. Easier to do unwrap check in 2118 than to audit all possible executors.) On the other hand, this makes trying to catch the error before it hits the exception hook more of a pain, as in the next item... CollationController needs to retain its try/catch, since we want to allow the read to succeed, even if the defragmenting write fails. Since it could error w/ either FSWE or EE (from the commitlog add), probably need to catch generic Exception. For 2118 we can add some way to submit this to the disk blacklister without re-throwing. Looks like it would be worth adding a constructor for FSRW taking a Descriptor. SSTR.createLinks should throw FSWE. Methods called by SSTW constructor should throw FSWE. SSTW methods should throw FSWE. (callers of append will want to catch + re-throw after cleanup.) TruncateVerbHandler (and anyone else) shouldn't swallow potential FSWE by logging, need to rethrow. (FBUtilities.unchecked is handy in such cases.) I agree with your use of AssertionError in LCR. Would prefer to use RTE in SSTableReader though, since we do some tricky reference counting around that and I wouldn't want to ignore problems there b/c someone turned off assertions. (Surprisingly common...) SSTII should throw IOException when it doesn't know what DataInput is. Callers can transform to FSRE. (Other constructors, or in the last case, IncomingStreamReader.) Corrupt sstables (sstablescanner + others?) shouldn't be turned into FSRE, since it's usually bad memory or a bug and not the disk's fault. FileUtils should throw FSWE. BTW: congratulations on getting import ordering (almost) correct on the first try. The only thing missing is, com.google.common goes above org.slf4j instead of being lumped in with "everything else."
        Hide
        Pavel Yaskevich added a comment -

        Looks like we should drop the "throws IOException" declaration from applyIndexUpdates (and have that chain throw FSWE as needed).

        If we do so, I think we need to get message about throwing FSWE to the javadoc for the method to get API users attention to change their code appropriately.

        Show
        Pavel Yaskevich added a comment - Looks like we should drop the "throws IOException" declaration from applyIndexUpdates (and have that chain throw FSWE as needed). If we do so, I think we need to get message about throwing FSWE to the javadoc for the method to get API users attention to change their code appropriately.
        Hide
        Aleksey Yeschenko added a comment -

        Looks like we should drop the "throws IOException" declaration from applyIndexUpdates (and have that chain throw FSWE as needed).

        I think *SecondaryIndex methods should throw some kind of SecondaryIndexException instead.

        SSTII should throw IOException when it doesn't know what DataInput is. Callers can transform to FSRE. (Other constructors, or in the last case, IncomingStreamReader.)

        Do you mean the constructor only? Because other methods (next, reset) in SSTII implement Iterator/ICountableColumnIterator. Can't throw IOE. Can modify ICountableColumnIterator, but not Iterator, for obvious reasons. Maybe throw IOException from the constructor, but use instanceof and throw FSReadError in reset and next?

        BTW: congratulations on getting import ordering (almost) correct on the first try. The only thing missing is, com.google.common goes above org.slf4j instead of being lumped in with "everything else."

        Thanks. Had to do something almost right.

        Show
        Aleksey Yeschenko added a comment - Looks like we should drop the "throws IOException" declaration from applyIndexUpdates (and have that chain throw FSWE as needed). I think *SecondaryIndex methods should throw some kind of SecondaryIndexException instead. SSTII should throw IOException when it doesn't know what DataInput is. Callers can transform to FSRE. (Other constructors, or in the last case, IncomingStreamReader.) Do you mean the constructor only? Because other methods (next, reset) in SSTII implement Iterator/ICountableColumnIterator. Can't throw IOE. Can modify ICountableColumnIterator, but not Iterator, for obvious reasons. Maybe throw IOException from the constructor, but use instanceof and throw FSReadError in reset and next? BTW: congratulations on getting import ordering (almost) correct on the first try. The only thing missing is, com.google.common goes above org.slf4j instead of being lumped in with "everything else." Thanks. Had to do something almost right.
        Hide
        Jonathan Ellis added a comment -

        I think *SecondaryIndex methods should throw some kind of SecondaryIndexException instead

        Why would that be more useful?

        Do you mean the constructor only?

        Yes.

        Show
        Jonathan Ellis added a comment - I think *SecondaryIndex methods should throw some kind of SecondaryIndexException instead Why would that be more useful? Do you mean the constructor only? Yes.
        Hide
        Aleksey Yeschenko added a comment -

        Why would that be more useful?

        In some implementations (not talking about KeysIndex) it's not just IO that might go wrong. I don't want to make people choose between wrapping the real exception in IOE or RTE or swallowing everything.
        There is at least one other implementation - in DSE, probably other ones, so it's not a completely made up issue.

        Show
        Aleksey Yeschenko added a comment - Why would that be more useful? In some implementations (not talking about KeysIndex) it's not just IO that might go wrong. I don't want to make people choose between wrapping the real exception in IOE or RTE or swallowing everything. There is at least one other implementation - in DSE, probably other ones, so it's not a completely made up issue.
        Hide
        Aleksey Yeschenko added a comment -

        CollationController needs to retain its try/catch, since we want to allow the read to succeed, even if the defragmenting write fails. Since it could error w/ either FSWE or EE (from the commitlog add), probably need to catch generic Exception. For 2118 we can add some way to submit this to the disk blacklister without re-throwing.

        Actually it can't throw - it skips both the commitlog write and the index update (it calls Table#apply(rm, false, false)). It's mentioned in the comment above the call. I think it's fine to leave out the try/catch.

        Show
        Aleksey Yeschenko added a comment - CollationController needs to retain its try/catch, since we want to allow the read to succeed, even if the defragmenting write fails. Since it could error w/ either FSWE or EE (from the commitlog add), probably need to catch generic Exception. For 2118 we can add some way to submit this to the disk blacklister without re-throwing. Actually it can't throw - it skips both the commitlog write and the index update (it calls Table#apply(rm, false, false)). It's mentioned in the comment above the call. I think it's fine to leave out the try/catch.
        Hide
        Jonathan Ellis added a comment -

        I don't want to make people choose between wrapping the real exception in IOE or RTE or swallowing everything.

        The status quo is "IOE or RTE." IOE is superseded by FSWE, and RTE can stay the same. If we need to improve things beyond that, it's out of scope for this ticket.

        Actually it can't throw

        You're right.

        Show
        Jonathan Ellis added a comment - I don't want to make people choose between wrapping the real exception in IOE or RTE or swallowing everything. The status quo is "IOE or RTE." IOE is superseded by FSWE, and RTE can stay the same. If we need to improve things beyond that, it's out of scope for this ticket. Actually it can't throw You're right.
        Hide
        Aleksey Yeschenko added a comment -

        The status quo is "IOE or RTE." IOE is superseded by FSWE, and RTE can stay the same. If we need to improve things beyond that, it's out of scope for this ticket.

        Just to be clear: by IOE here I meant IOException, not IOError. I was thinking about KeysIndex#forceBlockingFlush that is (was) wrapping ExecutionException AND InterruptedException in IOException just because the interface allowed only IOEx.
        But I agree with you regarding "drop the "throws IOException" declaration from applyIndexUpdates". At one point I actually did just that. Will do.

        Show
        Aleksey Yeschenko added a comment - The status quo is "IOE or RTE." IOE is superseded by FSWE, and RTE can stay the same. If we need to improve things beyond that, it's out of scope for this ticket. Just to be clear: by IOE here I meant IOException, not IOError. I was thinking about KeysIndex#forceBlockingFlush that is (was) wrapping ExecutionException AND InterruptedException in IOException just because the interface allowed only IOEx. But I agree with you regarding "drop the "throws IOException" declaration from applyIndexUpdates". At one point I actually did just that. Will do.
        Hide
        Aleksey Yeschenko added a comment - - edited

        I reverted changes to the SSTII constuctor. Can't reliably throw any FSE from there, from the calling code either.

        SSTW methods' callers that used to catch IOE and do writer.abort() now catch FSWE and do the same. Any call that wasn't performing any cleanup was left as it was - without the cleanup.

        And most FSRE were converted to RTE since in most cases you can't be sure that FS was indeed the reason for the IOE.

        This has to be all this time.

        Show
        Aleksey Yeschenko added a comment - - edited I reverted changes to the SSTII constuctor. Can't reliably throw any FSE from there, from the calling code either. SSTW methods' callers that used to catch IOE and do writer.abort() now catch FSWE and do the same. Any call that wasn't performing any cleanup was left as it was - without the cleanup. And most FSRE were converted to RTE since in most cases you can't be sure that FS was indeed the reason for the IOE. This has to be all this time.
        Hide
        Jonathan Ellis added a comment -

        v5 attached. I got up through KeyIterator in the review but got bogged down in the compressed reader code, which needs some deeper fixes. (Still lots of bare IOExceptions being thrown in CM.Writer and CRAR.)

        Here's the changes I made while going through it:

        • updated try/catch in CCE.create to wrap the bytesRemaining call specifically
        • moved StorageService.getCannonicalPath into FileUtils and added overload taking a File
        • updated CFMD.reload to turn ConfigurationException into RTE instead of IOE
        • Introduced CorruptSSTableError where we were using RTE. We don't do anything with it, yet, but this will make it much easier if we need to down the road since RTE is used all over the place.
        • added FSWE around setLength in CommitLogSegment
        • removed extra catches in doCleanupCompaction in favor of the existing catch (Exception) block, extended to Throwable
        • split FailureDetector RTE into RTE + FSWE
        • lots of changes to RandomAccessReader, CompressionMetadata, CRAR (incomplete)
        Show
        Jonathan Ellis added a comment - v5 attached. I got up through KeyIterator in the review but got bogged down in the compressed reader code, which needs some deeper fixes. (Still lots of bare IOExceptions being thrown in CM.Writer and CRAR.) Here's the changes I made while going through it: updated try/catch in CCE.create to wrap the bytesRemaining call specifically moved StorageService.getCannonicalPath into FileUtils and added overload taking a File updated CFMD.reload to turn ConfigurationException into RTE instead of IOE Introduced CorruptSSTableError where we were using RTE. We don't do anything with it, yet, but this will make it much easier if we need to down the road since RTE is used all over the place. added FSWE around setLength in CommitLogSegment removed extra catches in doCleanupCompaction in favor of the existing catch (Exception) block, extended to Throwable split FailureDetector RTE into RTE + FSWE lots of changes to RandomAccessReader, CompressionMetadata, CRAR (incomplete)
        Hide
        Aleksey Yeschenko added a comment -

        The patch is getting large. If you find that something is still missing, but that what is present is good enough, then it probably makes sense to merge this one and add the missing fixes in a separate patch.

        Show
        Aleksey Yeschenko added a comment - The patch is getting large. If you find that something is still missing, but that what is present is good enough, then it probably makes sense to merge this one and add the missing fixes in a separate patch.
        Hide
        Jonathan Ellis added a comment -

        Okay. Made some minor updates to v6 and committed.

        Still to fix:

        • SequentialWriter has a lot of IOExceptions that should be converted to FSWE
        Show
        Jonathan Ellis added a comment - Okay. Made some minor updates to v6 and committed. Still to fix: SequentialWriter has a lot of IOExceptions that should be converted to FSWE
        Hide
        Jonathan Ellis added a comment -

        lgtm, committed

        Show
        Jonathan Ellis added a comment - lgtm, committed

          People

          • Assignee:
            Aleksey Yeschenko
            Reporter:
            Chris Goffinet
            Reviewer:
            Jonathan Ellis
          • Votes:
            3 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development