[CALCITE-4496] Measure columns ("SELECT ... AS MEASURE") - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Description

In multi-dimensional languages such as MDX, DAX, Tableau, you can define calculations in your models that can be re-evaluated in other dimensional contexts. (The models are often called cubes, and the calculations are often called measures.)

In SQL, the model is a view (or a sub-query in the FROM clause) but the columns are just values. Suppose you have a private Employees table, a Departments view that rolls Employees up to department level and has an averageSalary column. Now suppose you wish to roll up averageSalary to the region level. The values that went into averageSalary are not available to you, either directly or indirectly, so the best you can do is to average-the-averages.

In this proposed (and experimental) feature, you can define a special kind of column - a measure - in the SELECT list of a view (or sub-query in a FROM clause), and it remains a calculation. When a query uses a measure column, the calculation is re-evaluated in the context of that query.

To some extent, this breaches the "black box" property of SQL views. Hitherto, a SQL view can be replaced with a table that has the same contents, and all queries that use that view will return the same results. That property no longer holds. But the view remains a useful "hiding" abstraction, and the rows that compose that view cannot be viewed directly.

Like dimensional models, measures in SQL would allow high-level abstractions such as key-performance indicators (KPIs) to be shared and composed. Unlike dimensional models, the models remain relational, namely, it is still possible to enumerate and count the rows in a model.

Consider the following view and query that uses it:

CREATE VIEW EmpSummary AS
SELECT deptno,
    job,
    AVG(sal) AS avg_sal,
    AVG(sal) AS MEASURE avg_sal_measure,
    COUNT(*) + 1 AS MEASURE count_plus_one_measure
FROM Emp
GROUP BY deptno, job;

SELECT deptno,
    AVG(avg_sal) AS a1,
    AGGREGATE(avg_sal_measure) AS a2,
    AGGREGATE(count_plus_one_measure) AS c1
FROM EmpSummary
GROUP BY deptno;

Note that there is a special aggregate function, AGGREGATE, that rolls up measures. Columns a1 and a2 will contain different values; the first averages the averages, and the second computes the average from the raw data. Column c1 will return the number of employees in each department plus one, not rolling up the "plus one" for each distinct job in the department.

This is just a brief sketch illustrating the purpose of measures. This feature is experimental, the syntax will no doubt change, and much of the semantics (for example, what expressions are valid as measures, whether measures remain measures they appear in the SELECT clause of an enclosing query, and what is the "context" in which a measure is evaluated) need to be ironed out.

Attachments

Issue Links

depends upon

CALCITE-5105 Add MEASURE type and AGGREGATE aggregate function

Closed

is related to

CALCITE-4483 WITHIN DISTINCT clause for aggregate functions (experimental)

Closed

relates to

CALCITE-5692 Add AT operator, for context-sensitive expressions

Open

Activity

People

Assignee:: Julian Hyde

Reporter:: Julian Hyde

Votes:: 2 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 11/Feb/21 23:31

Updated:: 02/Apr/24 07:31