I like the idea of having interfaces/base classes like ErrorHeuristics and Error.
Some questions about these new interfaces:
1.1 The comment says usually it returns a single error but the interface returns List<Error>. I think it should be good enough to return a single error, and the current code does assume that (when excluding reported ErrorHeuristics). This simplified the concepts. We can always have multiple ErrorHeuristics, one for each type of error.
1.2 readLogLine(String) should be named processLogLine(String). read is usually for InputStream to return the line.
1.3 Can we say "getError()" infers "reset"? Maybe name it "getErrorAndReset()". That simplified the interface.
2.1 Can we rename addTaskLogUrl to addTaskAttemptLogUrl?
2.2 Can we return only the error that is detected in the most number of task attempts? We can output more errors if they have the same counts.
2.3 Let's add comment to each level of the loop (there are 2 "all" here and it refers to 2 loops (while not knowing whether they are nested (in which order) or parallel)
+ // Read read the lines from all the task logs and feed them to all the
+ // error heuristics
3.1 Let's rename it to ErrorAndSolution? That's more appropriate I think. Accordingly we can modify all function names.
For the case that Namit mentions, I think we should just let the Operators output different kind of error messages, so ErrorHeuristics can capture that. I don't see how the added flag can help solve the problem (and it's actually never used in the code) so I would prefer doing it the old way.