[NIFI-6905] GetTwitter processor, configured to run on primary node only, initializes connection to Twitter API from every NiFi cluster node, even on non-primary nodes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.0
Fix Version/s: 1.11.0
Component/s: Extensions
Labels:
- getTwitter

Description

I have a GetTwitter processor running on a 3-nodes NiFi cluster and configured to be executed on the primary node only.
The symptom is that there is a too high frequency of HTTP 420 ("Enhance Your Calm") exceptions on GetTwitter processor start.

I made the following tests:

With only 1 NiFi node. I was able to start/stop GetTwitter processor 10 times in a raw without any errors.
With 2 NiFi nodes running, HTTP 420 errors occurred after a few start/stop (sometimes even after a single start).

After an analysis of the source code and knowing https://issues.apache.org/jira/browse/NIFI-2592 I came to the conclusion that the GetTwitter processor is initializing the connection to Twitter API on each node of the cluster, even to non-primary nodes.

The `onScheduled()` method is run on every node (see: NIFI-2592) making connections to Twitter with `client.connect()`. Then the `onTrigger()` method consumes the tweets normally from the primary node.
Issue is that having more that one node initializing connections make Twitter API raise HTTP 420 errors.

ERROR
org.apache.nifi.processors.twitter.GetTwitter
GetTwitter[id=XYZ] Received error HTTP_ERROR: HTTP/1.1 420 Enhance Your Calm. Will attempt to reconnect

Proposed solutions:

Change the behavior of `onScheduled()` method to run only on primary node (as proposed in NIFI-2592)
Update GetTwitter processor implementation to not call `client.connect()` anymore from the `onScheduled()` method but only when PrimaryNodeState changes to ELECTED_PRIMARY_NODE (And when PrimaryNodeState changes to PRIMARY_NODE_REVOKED: perform a `client.stop()`)

Attachments

Issue Links

is related to

NIFI-2592 Processor still runs @OnScheduled method when it's running on a non-primary node

Open

links to

GitHub Pull Request #3909

Activity

People

Assignee:: Kourge

Reporter:: Kourge

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 26/Nov/19 13:02

Updated:: 13/Dec/19 18:42

Resolved:: 13/Dec/19 18:42

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

0.5h