Currently there are multiple code locations where query text is written to the logs. This is particularly bad when it happens before the query is parsed, as there is no reliable way to identify strings in the query text due to various quoting and escaping schemes.
Printing query text or text strings like this could leak sensitive information into the logs. Particularly bad example (collected from the wild):
Totally forbidding the presence of query text in the logs would make it too hard to debug or support Impala, so there should be a global switch governing this behavior.
When the switch is set to disabling text printing, Impala should:
- not print unparsed query text to the logs; it should just print query IDs
- strip strings from the log output
When the switch is set to enabled Impala should
- print unparsed query text to the log
- let strings pass through to the logs, including parameter values, table names, column names etc.
The default (unconfigured) state of this switch should be disabled.
Impala should probably indicate if the switch is set to enabled to warn the user about possibly sensitive information being written to the logs.