Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-6747

MessagingService should handle failures on remote nodes.

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Low
    • Resolution: Fixed
    • Fix Version/s: 2.1 beta2
    • Component/s: None
    • Labels:

      Description

      While going through the code of MessagingService, I discovered that we don't handle callbacks on failure very well. If a Verb Handler on the remote machine throws an exception, it goes right through uncaught exception handler. The machine which triggered the message will keep waiting and will timeout. On timeout, it will so some stuff hard coded in the MS like hints and add to Latency. There is no way in IAsyncCallback to specify that to do on timeouts and also on failures.

      Here are some examples which I found will help if we enhance this system to also propagate failures back. So IAsyncCallback will have methods like onFailure.

      1) From ActiveRepairService.prepareForRepair

      IAsyncCallback callback = new IAsyncCallback()
      {
      @Override
      public void response(MessageIn msg)

      { prepareLatch.countDown(); }

      @Override
      public boolean isLatencyForSnitch()

      { return false; }

      };

      List<UUID> cfIds = new ArrayList<>(columnFamilyStores.size());
      for (ColumnFamilyStore cfs : columnFamilyStores)
      cfIds.add(cfs.metadata.cfId);

      for(InetAddress neighbour : endpoints)

      { PrepareMessage message = new PrepareMessage(parentRepairSession, cfIds, ranges); MessageOut<RepairMessage> msg = message.createMessage(); MessagingService.instance().sendRR(msg, neighbour, callback); }

      try

      { prepareLatch.await(1, TimeUnit.HOURS); }

      catch (InterruptedException e)

      { parentRepairSessions.remove(parentRepairSession); throw new RuntimeException("Did not get replies from all endpoints.", e); }

      2) During snapshot phase in repair, if SnapshotVerbHandler throws an exception, we will wait forever.

        Attachments

        1. CASSANDRA-6747-v2.diff
          15 kB
          Sankalp Kohli
        2. CASSANDRA-6747.diff
          15 kB
          Sankalp Kohli
        3. 6747-v3.txt
          17 kB
          Yuki Morishita

          Issue Links

            Activity

              People

              • Assignee:
                kohlisankalp Sankalp Kohli
                Reporter:
                kohlisankalp Sankalp Kohli
                Authors:
                Sankalp Kohli
                Reviewers:
                Yuki Morishita
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: