Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8156

Add format options to the EXPLAIN statement

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • Impala 3.1.0
    • None
    • Frontend
    • None
    • ghx-label-9

    Description

      The EXPLAIN statement is very basic:

      EXPLAIN <stmt>;
      

      Example:

      EXPLAIN SELECT * FROM alltypes;
      

      Explain does provide some options set as session options:

      SET set explain_level=extended;
      EXPLAIN <stmt>;
      

      We have often found the need for additional information. For example, it would be very useful to obtain the SELECT statement after view substitution.

      We wish to extend EXPLAIN to allow additional options, while retaining full backward compatibility. The extended syntax is:

      EXPLAIN [FORMAT([opt(, opt)*])] <stmt>;
      

      This syntax reuses the existing FORMAT keyword, and allow an unlimited set of options to be added in the future without the need to define new keywords.

      Options are in the name=value form with name as an identifier and value as a string literal. Both are case-insensitive. Example to set the explain level:

      EXPLAIN FORMAT(level=extended) SELECT * FROM alltypes;
      

      The two options supported at present are:

      • level - Sets the explain level.
      • rewritten - Shows the fully rewritten SQL statement with views expanded.

      The level option overrides the existing session options. If level is not present, then the session option is used instead. Values are identical to that for the SET explain_level='<value' statement.

      Rewritten SQL

      The rewritten option takes two values: true or false. If set, EXPLAIN returns the text of the rewritten SQL instead of the query plan. Example:

      functional> explain format(rewritten) SELECT * FROM view_view;
      
      +------------------------------------------------+
      | Explain String                                 |
      +------------------------------------------------+
      | SELECT * FROM /* functional.view_view */ (     |
      | SELECT * FROM /* functional.alltypes_view */ ( |
      | SELECT * FROM functional.alltypes)             |
      | )                                              |
      +------------------------------------------------+
      

      Here, the names in comments are the view names. Views are then expanded inline to show the full extend of the statement. This is very helpful to resolve user issues.

      Views are expanded only if the user has permissions on the underlying tables. Else, to avoid revealing details, views are unexpended if the proper privileges are lacking.

      Comparison with Other SQL Dialects

      The ISO SQL standard does not define the EXPLAIN statement, it is a vendor extension. MySQL defines EXPLAIN as:

      {EXPLAIN | DESCRIBE | DESC}
          [explain_type]
          {explainable_stmt | FOR CONNECTION connection_id}
      
      explain_type: {
          FORMAT = format_name
      }
      
      format_name: {
          TRADITIONAL
        | JSON
      }
      

      That is, MySQL also uses the FORMAT keyword with only two choices.

      SqlServer uses a form much like Impala's present form with no options.

      Postgres uses options and keywords:

      EXPLAIN [ ( option [, ...] ) ] statement
      EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
      
      where option can be one of:
      
          ANALYZE [ boolean ]
          VERBOSE [ boolean ]
          COSTS [ boolean ]
          BUFFERS [ boolean ]
          FORMAT { TEXT | XML | JSON | YAML }
      

      Apache Drill uses a series of keywords to express options:

      explain plan [ including all attributes ]
                   [ with implementation | without implementation ]
                   for <query> ;
      

      We claim that, given the wide variety of vendor implementations, the proposed Impala syntax is reasonable.

      Futures

      IMPALA-5973 proposes to add a JSON format for EXPLAIN output. We propose to select JSON output using the "format" option:

      EXPLAIN FORMAT(format='json') <stmt>
      

      The format can be combined other options such as level:

      EXPLAIN FORMAT(format='json', level='extended') <stmt>
      

      Details

      The key/value syntax is very general, but cumbersome for simple tasks. The FORMAT option allows a number of simplifications.

      First, for the explain level, each level can be used as a Boolean option:

      EXPLAIN FORMAT(extended='true') <stmt>
      

      Second, for Boolean options, the value is optional and "true" is assumed:

      EXPLAIN FORMAT(EXTENDED) <stmt>
      

      Third, if only a value is given, the value is assumed to be for the "format" key (which is not yet supported):

      EXPLAIN FORMAT('json') <stmt>
      

      Would, when JSON format is available, emit the plan as JSON.

      The astute reader will see opportunities for odd combinations of options. Rather than enforcing a strict set of rules, when given an odd set of rules, the FORMAT option simply does something reasonable. Example:

      EXPLAIN FORMAT(level='standard', extended, verbose='false') <stmt>
      

      The short answer here is that options are ambiguous, behavior is undefined, but reasonable.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Paul.Rogers Paul Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: