Uploaded image for project: 'Giraph (Retired)'
  1. Giraph (Retired)
  2. GIRAPH-1145

nextChannel: No channels exist! error when channel is trying to reconnect in another thread

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.2.0
    • None
    • bsp
    • None

    Description

      The method NettyClient.getNextChannel has a mechanism to detect when a channel is no longer active. In this case, it removes it from the ChannelRotator while it tries to reconnect, then re-adds it once successful.

      When there are more client threads than channels, it is possible for a client thread to call ChannelRotator.nextChannel it is empty because all channels are trying to reconnect. This throws IllegalArgumentException("nextChannel: No channels exist!"), which kills the worker.

      Instead, the thread should have some way of knowing that there's a channel currently reconnecting so that it can wait for it. If the reconnection fails after the specified number of retries, the thread that is trying to reconnect it will throw an exception and fail the worker, so there's no concern about hanging here.

      A workaround is to ensure that giraph.channelsPerServer >= giraph.nettyClientThreads, but this is often not desirable in cases with many workers.

      Attachments

        Activity

          People

            Unassigned Unassigned
            nseggert Nic Eggert
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: