Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28682

ITBLL and other MR-based integration tests should heartbeat often

    XMLWordPrintableJSON

Details

    • Brainstorming
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      We have this little note in our ITBLL harness,

            // If we cause enough chaos, RPC requests might get into long backoffs. During this
            // time, it won't send keep alives to the map/reduce context. So increase the timeout
            // a bunch
      

      Investigating, the ITBLL Generator's persist method updates the MR context progress only every 100 puts. You'd think that would be enough, but given chaos, it really isn't. What if we update progress with every put? Digging through MR source code, it seems that calling the context.progress() method only sets an AtomicBoolean that a progress update needs sent, actual sending of progress reports is gated by mapreduce.task.progress-report.interval, or 1% of mapreduce.task.timeout, which defaults to 1% of 300_000ms, or 3 seconds. So yeah, we should probably update this AtomicBool much more often in chaotic jobs, as doing so is effectively free and will improve reliability.

      But still, every put is perhaps excessive. What if we add a pre-flush hook to (Async)BufferedMutator so that a MR job can set this progress flag right before the client disappears down into a retry loop? I bet other applications would find such a hook useful as well.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ndimiduk Nick Dimiduk
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: