Uploaded image for project: 'IMPALA'
  2. IMPALA-4244

Impala should strip all strings from log output unless explicitly configured to do so



    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 2.5.0
    • None
    • Frontend


      Currently there are multiple code locations where query text is written to the logs. This is particularly bad when it happens before the query is parsed, as there is no reliable way to identify strings in the query text due to various quoting and escaping schemes.

      Printing query text or text strings like this could leak sensitive information into the logs. Particularly bad example (collected from the wild):

      I0610 13:06:43.571676  2022 Frontend.java:818] analyze query SELECT user_id, username, group_id FROM db.table WHERE username='USER' AND password='BAD'"

      Totally forbidding the presence of query text in the logs would make it too hard to debug or support Impala, so there should be a global switch governing this behavior.

      When the switch is set to disabling text printing, Impala should:

      • not print unparsed query text to the logs; it should just print query IDs
      • strip strings from the log output

      When the switch is set to enabled Impala should

      • print unparsed query text to the log
      • let strings pass through to the logs, including parameter values, table names, column names etc.

      The default (unconfigured) state of this switch should be disabled.

      Impala should probably indicate if the switch is set to enabled to warn the user about possibly sensitive information being written to the logs.




            Unassigned Unassigned
            laszlog Laszlo Gaal
            0 Vote for this issue
            2 Start watching this issue