• Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Backend
    • None
    • ghx-label-2


      Suggested during the review of (IMPALA-9296

      I'm not sure that this is the right wa[y] to do it, since it means that if a backend sees multiple rpc failures in a single query only one will ever be reported to the coordinator.

      Of course, I've been advocating for being aggressive about blacklisting. Suppose there were two rpc failures, then there are two cases here - either both rpcs were to the same other executor, in which case the fact that there were two failures makes us more confident something is going on with that executor and we might actually want to blacklist the executor twice (which will just extend the amount of time that it stays blacklisted for), or the two rpcs were to different executors, in which case if we only blacklist one of them if we then retry the query it may very well fail again.

      And even if we do want to stay more conservative about blacklisting, you've suggested before (and I agree) that its generally preferable to report as much info about errors as we've got, and then centralize the logic for deciding how to act on those errors in the coordinator.




            Unassigned Unassigned
            stakiar Sahil Takiar
            0 Vote for this issue
            1 Start watching this issue