Should we check whether ClientTraceLog.isInfoEnabled() before logging?
Excluding the string concatenation to produce the actual, the cost of each log message is low or infrequent (like the shuffle message). Excluding the new read log message, it's comparable to the logging that's already happening. I'm not certain if the logging this replaces (for client writes) should occur when ClientTraceLog.inInfoEnabled() is false, since nothing would be logged in that case...
Should we define an AUDIT_FORMAT for the log messages, like FSNamesystem.AUDIT_FORMAT?
Unlike the FSNamesystem audit format, these are going to require some additional processing to be useful (e.g. the id param, optional block id), so the key/value pairing doesn't offer the same syntactical guarantees. That said, you're probably right, but unless we adopt a packaging like what you suggest in your following point, we'd introduce a link between hdfs and mapred. For now- with only these few messages- I don't think it gains much by being pulled out.
I think it might worth to create a utility class, say org.apache.hadoop.log.AuditLog, so that we could put AUDIT_FORMAT, isInfoEnabled(), etc. inside it. Then, both DataNode and FSNamesystem can use it.
Agreed: it would be better if there were a more central location for Hadoop APIs exported through the logging interfaces, like audit logs and these metrics. If nothing else, it would let us know which messages have consumers (hence the uncertainty for logging client writes). That's likely part of a different patch, though.