Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-13696

Digest mismatch Exception if hints file has UnknownColumnFamily

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Blocker
    • Resolution: Unresolved
    • Fix Version/s: 3.0.x, 3.11.x, 4.x
    • Component/s: Core
    • Labels:
      None

      Description

      WARN  [HintsDispatcher:2] 2017-07-16 22:00:32,579 HintsReader.java:235 - Failed to read a hint for /127.0.0.2: a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0 - table with id 3882bbb0-6a71-11e7-9bca-2759083e3964 is unknown in file a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints
      ERROR [HintsDispatcher:2] 2017-07-16 22:00:32,580 HintsDispatchExecutor.java:234 - Failed to dispatch hints file a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints: file is corrupted ({})
      org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch exception
          at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:199) ~[main/:na]
          at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:164) ~[main/:na]
          at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[main/:na]
          at org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:157) ~[main/:na]
          at org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:139) ~[main/:na]
          at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:123) ~[main/:na]
          at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:95) ~[main/:na]
          at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:268) [main/:na]
          at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:251) [main/:na]
          at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:229) [main/:na]
          at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:208) [main/:na]
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_111]
          at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111]
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_111]
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_111]
          at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) [main/:na]
          at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111]
      Caused by: java.io.IOException: Digest mismatch exception
          at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNextInternal(HintsReader.java:216) ~[main/:na]
          at org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:190) ~[main/:na]
          ... 16 common frames omitted
      

      It causes multiple cassandra nodes stop by default.

      Here is the reproduce steps on a 3 nodes cluster, RF=3:
      1. stop node1
      2. send some data with quorum (or one), it will generate hints file on node2/node3
      3. drop the table
      4. start node1

      node2/node3 will report "corrupted hints file" and stop. The impact is very bad for a large cluster, when it happens, almost all the nodes are down at the same time and we have to remove all the hints files (which contain the dropped table) to bring the node back.

        Activity

        Hide
        jay.zhuang Jay Zhuang added a comment -

        The problem is because skipBytes() will leave some value in the buffer. Before returning, it should call input.checkCrc() to update the CRC.
        The fix is to set hint=null and use the reminding code to do checkCRC(). Adding an uTest to reproduce the problem:

        Branch uTest
        13696-3.0 circleci#21
        13696-3.11 circleci#24
        13696-trunk circleci#23

        Jeff Jirsa would you please review the patch?

        Show
        jay.zhuang Jay Zhuang added a comment - The problem is because skipBytes() will leave some value in the buffer. Before returning, it should call input.checkCrc() to update the CRC. The fix is to set hint=null and use the reminding code to do checkCRC() . Adding an uTest to reproduce the problem: Branch uTest 13696-3.0 circleci#21 13696-3.11 circleci#24 13696-trunk circleci#23 Jeff Jirsa would you please review the patch?
        Hide
        jjirsa Jeff Jirsa added a comment -

        Looks pretty serious, I'll try to get to it soon. Short term, your trunk patch doesn't actually compile ( CFMetaData is gone in trunk), can you fix that and push that branch?

        Show
        jjirsa Jeff Jirsa added a comment - Looks pretty serious, I'll try to get to it soon. Short term, your trunk patch doesn't actually compile ( CFMetaData is gone in trunk), can you fix that and push that branch?
        Hide
        jay.zhuang Jay Zhuang added a comment -

        Sure, updated the test for trunk:

        Branch uTest
        13696-3.0 circleci#21
        13696-3.11 circleci#24
        13696-trunk circleci#25

        Yes, I think it's a serious problem that all nodes go down at the same time. And to recover, the user has to delete all hints file (Not sure if there's an easy way to identify the "corrupted" hints files, system.log only shows the first one. So for us, we just delete all the hints files.)

        Show
        jay.zhuang Jay Zhuang added a comment - Sure, updated the test for trunk: Branch uTest 13696-3.0 circleci#21 13696-3.11 circleci#24 13696-trunk circleci#25 Yes, I think it's a serious problem that all nodes go down at the same time. And to recover, the user has to delete all hints file (Not sure if there's an easy way to identify the "corrupted" hints files, system.log only shows the first one. So for us, we just delete all the hints files.)
        Hide
        jjirsa Jeff Jirsa added a comment -

        Hey Jay Zhuang - took a look and I don't see any issues. The new tests look good. trunk circleCI run failed due to circleci, but run #26 looks green.

        I'd like to glance at it again tomorrow when I'm more awake, but as of right now, it seems very reasonable and correct to me.

        Show
        jjirsa Jeff Jirsa added a comment - Hey Jay Zhuang - took a look and I don't see any issues. The new tests look good. trunk circleCI run failed due to circleci, but run #26 looks green. I'd like to glance at it again tomorrow when I'm more awake, but as of right now, it seems very reasonable and correct to me.
        Hide
        jay.zhuang Jay Zhuang added a comment - - edited

        Thanks Jeff Jirsa.
        I did more investigation today. Seems it's more serious than I thought. Even there's no down node, "drop table" + write traffic, will trigger the problem.
        Here is reproduce steps:
        1. Create a 3 nodes cluster:
        $ ccm create test13696 -v 3.0.14 && ccm populate -n 3 && ccm start
        2. Send some traffics with cassandra-stress (blogpost.yaml is only in trunk, if you use another yaml file, change the RF=3)
        $ tools/bin/cassandra-stress user profile=test/resources/blogpost.yaml cl=QUORUM truncate=never ops(insert=1) duration=30m -rate threads=2 -mode native cql3 -node 127.0.0.1
        3. While the traffic is running, drop table
        $ cqlsh -e "drop table stresscql.blogposts"
        All 3 nodes go down because of "Digest mismatch Exception".

        The CRC calculation problem has been there for a long time, but only got exposed after CASSANDRA-13004 because of the MessagingService version bump. In the normal case when the versions are the same, HintsDispatcher uses page.buffersIterator() instead of page.hintsIterator(). buffersIterator() doesn't need to decode hints, so it won't have the problem.

        I think the messagingVersion for the hints file should be updated: https://github.com/apache/cassandra/compare/cassandra-3.0...cooldoger:13696.2-3.0?expand=1 so it could dispatch hints in an optimized way. Not sure if we need to check/bump other {{MessagingService.VERSION_30}}s in the 3.0 branch.
        cc Alex Petrov

        Show
        jay.zhuang Jay Zhuang added a comment - - edited Thanks Jeff Jirsa . I did more investigation today. Seems it's more serious than I thought. Even there's no down node, "drop table" + write traffic, will trigger the problem. Here is reproduce steps: 1. Create a 3 nodes cluster: $ ccm create test13696 -v 3.0.14 && ccm populate -n 3 && ccm start 2. Send some traffics with cassandra-stress (blogpost.yaml is only in trunk, if you use another yaml file, change the RF=3) $ tools/bin/cassandra-stress user profile=test/resources/blogpost.yaml cl=QUORUM truncate=never ops(insert=1) duration=30m -rate threads=2 -mode native cql3 -node 127.0.0.1 3. While the traffic is running, drop table $ cqlsh -e "drop table stresscql.blogposts" All 3 nodes go down because of "Digest mismatch Exception". The CRC calculation problem has been there for a long time, but only got exposed after CASSANDRA-13004 because of the MessagingService version bump. In the normal case when the versions are the same, HintsDispatcher uses page.buffersIterator() instead of page.hintsIterator() . buffersIterator() doesn't need to decode hints, so it won't have the problem. I think the messagingVersion for the hints file should be updated: https://github.com/apache/cassandra/compare/cassandra-3.0...cooldoger:13696.2-3.0?expand=1 so it could dispatch hints in an optimized way. Not sure if we need to check/bump other {{MessagingService.VERSION_30}}s in the 3.0 branch. cc Alex Petrov
        Hide
        jjirsa Jeff Jirsa added a comment -

        cc Aleksey Yeschenko as well.

        Show
        jjirsa Jeff Jirsa added a comment - cc Aleksey Yeschenko as well.
        Hide
        ifesdjeen Alex Petrov added a comment - - edited

        Might be we want to use either 30 or 3014 depending on which one is active?

        Question: does this happen in mixed version cluster or all the nodes actually have the same protocol version?

        Show
        ifesdjeen Alex Petrov added a comment - - edited Might be we want to use either 30 or 3014 depending on which one is active? Question: does this happen in mixed version cluster or all the nodes actually have the same protocol version?
        Hide
        jay.zhuang Jay Zhuang added a comment -

        Question: does this happen in mixed version cluster or all the nodes actually have the same protocol version?

        All the nodes are on the same messagingVersion VERSION_3014
        It can be reproduced with a new 3.0.14 cluster.

        Show
        jay.zhuang Jay Zhuang added a comment - Question: does this happen in mixed version cluster or all the nodes actually have the same protocol version? All the nodes are on the same messagingVersion VERSION_3014 It can be reproduced with a new 3.0.14 cluster.
        Hide
        ifesdjeen Alex Petrov added a comment - - edited

        I agree we should also return a correct version from the hints service (as Jay Zhuang already mentioned), like here same as we do in commit log descriptor.

        This also would make the issue for same-version go away, and since it would make the service to pick a different code path I'd say it's also necessary to include it.

        WRT to the patch itself, might be it's better to just call resetCrc explicitly and still return null like I did here? hint is a local variable, and setting it and carrying on makes the logic a bit harder to understand. For example, for me it was non-obvious that this boolean method would also do some buffer rewinding / state resetting under the hood.

        Show
        ifesdjeen Alex Petrov added a comment - - edited I agree we should also return a correct version from the hints service (as Jay Zhuang already mentioned), like here same as we do in commit log descriptor. This also would make the issue for same-version go away, and since it would make the service to pick a different code path I'd say it's also necessary to include it. WRT to the patch itself, might be it's better to just call resetCrc explicitly and still return null like I did here ? hint is a local variable, and setting it and carrying on makes the logic a bit harder to understand. For example, for me it was non-obvious that this boolean method would also do some buffer rewinding / state resetting under the hood.
        Hide
        jjirsa Jeff Jirsa added a comment -

        Wanted to chat with Aleksey offline about the version implications, so withholding comment on that for a bit, but:

        might be it's better to just call resetCrc explicitly and still return null like I did here? hint is a local variable, and setting it and carrying on makes the logic a bit harder to understand. For example, for me it was non-obvious that this boolean method would also do some buffer rewinding / state resetting under the hood.

        I think the current behavior is actually the right thing to do. Simply resetting the CRC state isn't enough, we need to check to see if the CRC matches, because we want to invoke the disk failure policy if we're reading corrupt data, and frankly, a corruption source (bad disk / RAM / etc) that flips bits could cause us to see an invalid CFID, and skipping the corruption test at that point would be the wrong thing to do. A few more comment lines are probably worthwhile, though, since it seems like an easy 'fix' to revert in the future because it's nonobvious.

        Show
        jjirsa Jeff Jirsa added a comment - Wanted to chat with Aleksey offline about the version implications, so withholding comment on that for a bit, but: might be it's better to just call resetCrc explicitly and still return null like I did here? hint is a local variable, and setting it and carrying on makes the logic a bit harder to understand. For example, for me it was non-obvious that this boolean method would also do some buffer rewinding / state resetting under the hood. I think the current behavior is actually the right thing to do. Simply resetting the CRC state isn't enough, we need to check to see if the CRC matches, because we want to invoke the disk failure policy if we're reading corrupt data, and frankly, a corruption source (bad disk / RAM / etc) that flips bits could cause us to see an invalid CFID, and skipping the corruption test at that point would be the wrong thing to do. A few more comment lines are probably worthwhile, though, since it seems like an easy 'fix' to revert in the future because it's nonobvious.
        Hide
        ifesdjeen Alex Petrov added a comment -

        Simply resetting the CRC state isn't enough,

        True. Re-read the issue description/first comment, now it's quite obvious. Thanks for explanation.

        Adding a comment would be great though!

        Show
        ifesdjeen Alex Petrov added a comment - Simply resetting the CRC state isn't enough, True. Re-read the issue description/first comment, now it's quite obvious. Thanks for explanation. Adding a comment would be great though!
        Hide
        jay.zhuang Jay Zhuang added a comment -

        Updated based on the reviews:

        Branch uTest
        13696-3.0 circleci#31
        13696-3.11 circleci#32
        13696-trunk circleci#30

        Also, resetCrc() could also cause CRC mismatch for the next hint read in the same file.

        Another question is why dropTable + write traffic will cause hints file, do you think it's an issue? I create a separate ticket to track that: CASSANDRA-13712

        Show
        jay.zhuang Jay Zhuang added a comment - Updated based on the reviews: Branch uTest 13696-3.0 circleci#31 13696-3.11 circleci#32 13696-trunk circleci#30 Also, resetCrc() could also cause CRC mismatch for the next hint read in the same file. Another question is why dropTable + write traffic will cause hints file, do you think it's an issue? I create a separate ticket to track that: CASSANDRA-13712

          People

          • Assignee:
            jay.zhuang Jay Zhuang
            Reporter:
            jay.zhuang Jay Zhuang
            Reviewer:
            Jeff Jirsa
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development