The plan I'm looking at has hundreds of nodes. It takes a long time to scroll around the pages to get to the top. Fix the tab bar at the top of the page to simplify navigation. "The one with Physical Plan, Visualization, etc."
On the Physical Plan page: The top of the page displays histogram of minor fragment execution. However, it is hard to infer what it displays.
- Label the x-axis. The units seem to be seconds, but a legend of: "Runtime (sec.)" would help. (
- Label the y-axis. Seems to be colored by major fragment, lines by minor fragment. But, took some sleuthing to figure this out.
- Tooltip on each color band to identify the major fragment. (Probably too fiddly to label minor fragment lines.) (
- Choose a wider palette of colors. On my chart, the top two groups are shades of organge, the third is blue. Seems we could rotate among the standard set of colors for better contrast. ( Need to identify a good small palette )
In the tables:
For each operator, list the number of rows processed. (Available in the details already.)( DRILL-5195)
- In the table that sumarizes major fragments, have as a tool-tip the names of the minor fragments to give the numbers some meaning. That is, hovering over 00-xx-xx should say "Project, Merging Receiver".
- In the table that shows minor fragments for major fragments, either add a list of minor fragment names to the title, or as a pop-up. That is, in the heading that says, "Major Fragment: 02-xx-xx", add "(PARQUET_ROW_GROUP_SCAN, PROJECT, ...)
For each minor fragment, label the host on which it runs( DRILL-5803)
- For larger queries, shows groups by host, expanded to show fragments. (It is hard to read, say, 400 minor fragments in one big table. Showing 20 nodes (with summaries) is easier, each expanding to show 20 minor fragments.
In the Operator Profiles overview, add a tool-tip with details about each operator such as:
- Number of vector allocations
- Number of vector extensions (increasing the size of vectors)
- Average vector utilization (ratio of selected to unselected rows)
- Average batch size: number of rows, bytes per row, bytes per batch
- Number of files scanned
- Number of schemas found
- Number of bytes read (or file length if a table scan)
- Name of the file scanned (or first several if a group)
- Rows in, rows out and selectivity (as a ratio)
In the operator detail table:
Add a line for totals (records, batches)( DRILL-5195)
- Add a line for averages (most fields)
Under the "Full JSON Profile", part of the JSON is formatted, but the plan part is not. Display the plan in a formatted version (with proper indentation). It is not very useful in the current, streamed, non-indented form.
Better, move the JSON Profile to a new tab since it causes the Physical Plan page to get too large for large queries.
On the Visualized Plan page,
- The coloring of the fragments does not match the coloring used in the chart on the Physical Plan page. Please use the same to make them easier to correlate.
- Perhaps enclose each fragment in a box (with the border passing through the middle of each eachange operator.
- For each aspect of the plan, provide basic stats such as number of minor fragments, number of records, average time.
- For each node, provide a link to the Physical Plan page to see more detail.
- The visualized plan page shows the same info as the Physical Plan page. Better, keep each page focused and make it easy to navigate between them.
- Naming is inconsistent between this page and the Physical Plan page. "HASH_AGGREGATE" on the Physical Plan page, "HashAgg" on the visualization page.
On the Edit Query page, the actual field to edit the query is two lines long (but my query is over a dozen lines.) Then, the rest of the page repeats the chart, details, etc. Seems that, if I want to edit the query, I should have a field large enough to do so. And, I don't need the other info (I'll go to the corresponding tab it I want to see it.)
Then, regarding the Query page and Edit query, can these be combined? On the Query page, I see the query. If the query is running, I should see a Cancel button. If completed, I should see an Edit button followed by a Rerun button. Or, better, simply provide a Cancel (if running) button and a Edit (always) button. Edit shifts back to the main Query tab with with the text of the query filled in. This behavior more clearly demonsrates that the new query is independent of the original one.