Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-5953

GetTwitter processor throws Enhance Your Calm exceptions then fails with Retries exhausted

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.7.0
    • 1.10.0
    • Extensions
    • None

    Description

      Hi,

      I am using the GetTwitter processor, with the Filter Endpoint.
      The issue is that I am often getting series of `Received error HTTP_ERROR: HTTP/1.1 420 Enhance Your Calm. Will attempt to reconnect` exceptions.
      These are followed by one `Received error STOPPED_BY_ERROR: Retries exhausted due to null. Will not attempt to reconnect` exception and then the processor don't get any more tweet from Twitter endpoint.

      I am getting rate limited by Twitter API. I am running a NiFi cluster so I am running GetTwitter process on the Primary Node only to prevent using the same credentials several times in parallel.

      I tried to apply the configuration recommendation from this mailing list:
      <https://lists.apache.org/thread.html/ed397f42a26760280363e9cc1f64f6654c635110005e24ab9486bf19@%3Cdev.nifi.apache.org%3E

      But raising "run schedule" parameter to 60 seconds does not help in my case since I target reading between 100 and 200 tweets per minute. Setting "run schedule" to 60 seconds will let NiFi poll only 1 tweet per minute and won't be able to consume Twitter API tweets queue.

      Proposed solution

      I analyzed the `GetTwitter.java` implementation and noticed that the `onTrigger()`method reconnects (`client.reconnect();`) to the Twitter endpoint on `HTTP_ERROR`.
      The issue here is that `HTTP/1.1 420 Enhance Your Calm` messages are `HTTP_ERROR` but the Twitter HBC library client (com.twitter.hbc) already manage reconnection.
      Twitter HBC library client is making retries with an increasing wait delay by its own; with 5 retries by default.

      More, it seam that the `client.reconnect();` don't work in my case and this brings to be kicked off the Twitter API earlier because that method is called too often.

      My proposed solution is the following (tested on my local development)

      1. Letting Twitter HBC library client making the connection retries on `HTTP/1.1 420 Enhance Your Calm` messages.

      The `onTrigger()` method should be updated to not try to reconnect in case of `HTTP_ERROR` with message equal to `HTTP/1.1 420 Enhance Your Calm`:

       case HTTP_ERROR:
           if (!event.getMessage().equals("HTTP/1.1 420 Enhance Your Calm")) {
               getLogger().error("Received error {}: {}. Will attempt to reconnect", new Object[{event.getEventType(), event.getMessage()});
               client.reconnect();
           }
           else {
           getLogger().error("Received error {}: {}. Will not attempt to reconnect", new Object[]{event.getEventType(), event.getMessage()});
           }
       break;
      

      2. Parameterize maximum number of connection retries

      I also noticed that the default number of retries on the Twitter HBC library is sometimes too low (5 times).
      So it would be useful to add a GetTwitter processor property named `Max Connection Retries`. In my usage I found that `10` is a good value.

      Then update the `onSchedule()` method with this line (replacing `10` by the value of `Max Connection Retries`)

      clientBuilder.retries(10); // default value is 5
      

      Attachments

        Issue Links

          Activity

            People

              kourge-ch Kourge
              kourge-ch Kourge
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m