Description
Flume logs often have scary looking log messages that look like this:
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110801-235819417-0400.6477874242784836.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-005128611-0400.6740261462532911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
2011-08-14 00:00:56,177 INFO com.cloudera.flume.agent.WALAckManager: Retransmitting log.00000038.20110805-031622274-0400.6748955125414911.seq after being stale for 60802ms
2011-08-14 00:00:56,177 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: There was a race that happend with SENT vs SENDING states
This because previously we only expected deal with three states:
LOGGED, SENDING, SENT.
We actually need to deal with all possible states, and importantly, the SENDING state is a valid state to transition from.(not a race as reported). Here's the high-level idea:
Current state, state to transition to.
IMPORT -> IMPORT // new warn that this is an odd case.
WRITING -> WRITING // new warn that this is an odd case.
LOGGED -> LOGGED // This is a change, used to be considered race – This is legal – f it is log, it is slated for retry so stay put.
SENDING -> SENDING // This is the change, used to be considered race – This is legal – if we are sending the chunk already, keep sending it, no need to retry
SENT -> LOGGED // this was sent already but acks didn't work out. move to LOGGED state to retry.
E2EACKED -> E2EACKED // new acked already means it is good. No need to retry.
others -> others // other states are unexpected and remain in their state.