Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11845

Hanging repair in cassandra 2.2.4

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Low
    • Resolution: Not A Bug
    • None
    • None
    • Centos 6

    • Low

    Description

      So after increasing the streaming_timeout_in_ms value to 3 hours, i was able to avoid the socketTimeout errors i was getting earlier (https://issues.apAache.org/jira/browse/CASSANDRA-11826), but now the issue is repair just stays stuck.

      current status :-

      [2016-05-19 05:52:50,835] Repair session a0e590e1-1d99-11e6-9d63-b717b380ffdd for range (-3309358208555432808,-3279958773585646585] finished (progress: 54%)
      [2016-05-19 05:53:09,446] Repair session a0e590e3-1d99-11e6-9d63-b717b380ffdd for range (8149151263857514385,8181801084802729407] finished (progress: 55%)
      [2016-05-19 05:53:13,808] Repair session a0e5b7f1-1d99-11e6-9d63-b717b380ffdd for range (3372779397996730299,3381236471688156773] finished (progress: 55%)
      [2016-05-19 05:53:27,543] Repair session a0e5b7f3-1d99-11e6-9d63-b717b380ffdd for range (-4182952858113330342,-4157904914928848809] finished (progress: 55%)
      [2016-05-19 05:53:41,128] Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range (6499366179019889198,6523760493740195344] finished (progress: 55%)

      And its 10:46:25 Now, almost 5 hours since it has been stuck right there.

      Earlier i could see repair session going on in system.log but there are no logs coming in right now, all i get in logs is regular index summary redistribution logs.

      Last logs for repair i saw in logs :-

      INFO [RepairJobTask:5] 2016-05-19 05:53:41,125 RepairJob.java:152 - repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd TABLE_NAME is fully synced
      INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairSession.java:279 - repair #a0e5df00-1d99-11e6-9d63-b717b380ffdd Session completed successfully
      INFO [RepairJobTask:5] 2016-05-19 05:53:41,126 RepairRunnable.java:232 - Repair session a0e5df00-1d99-11e6-9d63-b717b380ffdd for range (6499366179019889198,6523760493740195344] finished

      Its an incremental repair, and in "nodetool netstats" output i can see logs like :-

      Repair e3055fb0-1d9d-11e6-9d63-b717b380ffdd
      /Node-2
      Receiving 8 files, 1093461 bytes total. Already received 8 files, 1093461 bytes total
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80872-big-Data.db 399475/399475 bytes(100%) received from idx:0/Node-2
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80879-big-Data.db 53809/53809 bytes(100%) received from idx:0/Node-2
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80878-big-Data.db 89955/89955 bytes(100%) received from idx:0/Node-2
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80881-big-Data.db 168790/168790 bytes(100%) received from idx:0/Node-2
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80886-big-Data.db 107785/107785 bytes(100%) received from idx:0/Node-2
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80880-big-Data.db 52889/52889 bytes(100%) received from idx:0/Node-2
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80884-big-Data.db 148882/148882 bytes(100%) received from idx:0/Node-2
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-80883-big-Data.db 71876/71876 bytes(100%) received from idx:0/Node-2
      Sending 5 files, 863321 bytes total. Already sent 5 files, 863321 bytes total
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db 161895/161895 bytes(100%) sent to idx:0/Node-2
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db 399865/399865 bytes(100%) sent to idx:0/Node-2
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db 149066/149066 bytes(100%) sent to idx:0/Node-2
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db 126000/126000 bytes(100%) sent to idx:0/Node-2
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db 26495/26495 bytes(100%) sent to idx:0/Node-2
      Repair c0c8af20-1d9c-11e6-9d63-b717b380ffdd
      /Node-3
      Receiving 11 files, 13896288 bytes total. Already received 11 files, 13896288 bytes total
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79186-big-Data.db 1598874/1598874 bytes(100%) received from idx:0/Node-3
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79196-big-Data.db 736365/736365 bytes(100%) received from idx:0/Node-3
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79197-big-Data.db 326558/326558 bytes(100%) received from idx:0/Node-3
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79187-big-Data.db 1484827/1484827 bytes(100%) received from idx:0/Node-3
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79180-big-Data.db 393636/393636 bytes(100%) received from idx:0/Node-3
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79184-big-Data.db 825459/825459 bytes(100%) received from idx:0/Node-3
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79188-big-Data.db 3568782/3568782 bytes(100%) received from idx:0/Node-3
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79182-big-Data.db 271222/271222 bytes(100%) received from idx:0/Node-3
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79193-big-Data.db 4315497/4315497 bytes(100%) received from idx:0/Node-3
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79183-big-Data.db 19775/19775 bytes(100%) received from idx:0/Node-3
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/tmp-la-79192-big-Data.db 355293/355293 bytes(100%) received from idx:0/Node-3
      Sending 5 files, 9444101 bytes total. Already sent 5 files, 9444101 bytes total
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73168-big-Data.db 1796825/1796825 bytes(100%) sent to idx:0/Node-3
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72604-big-Data.db 4549996/4549996 bytes(100%) sent to idx:0/Node-3
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73147-big-Data.db 1658881/1658881 bytes(100%) sent to idx:0/Node-3
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-72682-big-Data.db 1418335/1418335 bytes(100%) sent to idx:0/Node-3
      /data/cassandra/data/KEYSPACE_NAME/TABLE_NAME-01ad9750723e11e4bfe0d3887930a87c/la-73173-big-Data.db 20064/20064 bytes(100%) sent to idx:0/Node-3
      Read Repair Statistics:
      Attempted: 1142
      Mismatch (Blocking): 0
      Mismatch (Background): 0
      Pool Name Active Pending Completed
      Large messages n/a 0 779
      Small messages n/a 0 14756609
      Gossip messages n/a 0 119647

      The last three fields "Large messages" , "Small messages" and "Gossip messages" keep changing, "Large messages" has incremented by 2 in last 5 hours, other 2 are changing more frequently.

      I am unable to figure out whether repair is going on or stuck.. If its stuck.. what should be my course of action if i want to get that table repaired?

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            vin01 vin01
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment