Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-4496

Measure columns ("SELECT ... AS MEASURE")

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      In multi-dimensional languages such as MDX, DAX, Tableau, you can define calculations in your models that can be re-evaluated in other dimensional contexts. (The models are often called cubes, and the calculations are often called measures.)

      In SQL, the model is a view (or a sub-query in the FROM clause) but the columns are just values. Suppose you have a private Employees table, a Departments view that rolls Employees up to department level and has an averageSalary column. Now suppose you wish to roll up averageSalary to the region level. The values that went into averageSalary are not available to you, either directly or indirectly, so the best you can do is to average-the-averages.

      In this proposed (and experimental) feature, you can define a special kind of column - a measure - in the SELECT list of a view (or sub-query in a FROM clause), and it remains a calculation. When a query uses a measure column, the calculation is re-evaluated in the context of that query.

      To some extent, this breaches the "black box" property of SQL views. Hitherto, a SQL view can be replaced with a table that has the same contents, and all queries that use that view will return the same results. That property no longer holds. But the view remains a useful "hiding" abstraction, and the rows that compose that view cannot be viewed directly.

      Like dimensional models, measures in SQL would allow high-level abstractions such as key-performance indicators (KPIs) to be shared and composed. Unlike dimensional models, the models remain relational, namely, it is still possible to enumerate and count the rows in a model.

      Consider the following view and query that uses it:

      CREATE VIEW EmpSummary AS
      SELECT deptno,
          job,
          AVG(sal) AS avg_sal,
          AVG(sal) AS MEASURE avg_sal_measure,
          COUNT(*) + 1 AS MEASURE count_plus_one_measure
      FROM Emp
      GROUP BY deptno, job;
      
      SELECT deptno,
          AVG(avg_sal) AS a1,
          AGGREGATE(avg_sal_measure) AS a2,
          AGGREGATE(count_plus_one_measure) AS c1
      FROM EmpSummary
      GROUP BY deptno;

      Note that there is a special aggregate function, AGGREGATE, that rolls up measures. Columns a1 and a2 will contain different values; the first averages the averages, and the second computes the average from the raw data. Column c1 will return the number of employees in each department plus one, not rolling up the "plus one" for each distinct job in the department.

      This is just a brief sketch illustrating the purpose of measures. This feature is experimental, the syntax will no doubt change, and much of the semantics (for example, what expressions are valid as measures, whether measures remain measures they appear in the SELECT clause of an enclosing query, and what is the "context" in which a measure is evaluated) need to be ironed out.

      Attachments

        Issue Links

          Activity

            People

              julianhyde Julian Hyde
              julianhyde Julian Hyde
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: