[HBASE-28682] ITBLL and other MR-based integration tests should heartbeat often - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Brainstorming
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Client, integration tests, mapreduce
Labels:
None

Description

We have this little note in our ITBLL harness,

      // If we cause enough chaos, RPC requests might get into long backoffs. During this
      // time, it won't send keep alives to the map/reduce context. So increase the timeout
      // a bunch

Investigating, the ITBLL Generator's persist method updates the MR context progress only every 100 puts. You'd think that would be enough, but given chaos, it really isn't. What if we update progress with every put? Digging through MR source code, it seems that calling the context.progress() method only sets an AtomicBoolean that a progress update needs sent, actual sending of progress reports is gated by mapreduce.task.progress-report.interval, or 1% of mapreduce.task.timeout, which defaults to 1% of 300_000ms, or 3 seconds. So yeah, we should probably update this AtomicBool much more often in chaotic jobs, as doing so is effectively free and will improve reliability.

But still, every put is perhaps excessive. What if we add a pre-flush hook to (Async)BufferedMutator so that a MR job can set this progress flag right before the client disappears down into a retry loop? I bet other applications would find such a hook useful as well.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Nick Dimiduk

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 20/Jun/24 08:52

Updated:: 20/Jun/24 08:52