Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-4069

Elasticsearch the result of count(*) will be added to the result list twice when use multiple aggregations without group by

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • elasticsearch-adapter
    • None

    Description

      when you write a sql:

      select count(*) from mytable
      

      in:

      org.apache.calcite.adapter.elasticsearch.ElasticsearchTable#aggregate
      

      As an optimization, totalHit will be the result of the query for count *:

      // elastic exposes total number of documents matching a query in "/hits/total" path
      // this can be used for simple "select count(*) from table"
      final long total = res.searchHits().total().value();
      

      It is then added to the result set:

      if (groupBy.isEmpty()) {
        // put totals automatically for count(*) expression(s), unless they contain group by
        for (String expr : countAll) {
          result.forEach(m -> m.put(expr, total));
        }
      }
      

      This is fine if there is only one count * aggregate function in SQL.

      But when you write multiple aggregate functions in your SQL:

      select count(*),sum(field1) from mytable

      You'll notice that the result of count * is repeatedly added to the result set (it doesn't affect the final result, but I think it can be optimized).

      The result of count * is first added to the result set along with other aggregate function result values in the following code:

      final List<Map<String, Object>> result = new ArrayList<>();
      if (res.aggregations() != null) {
        // collect values
        ElasticsearchJson.visitValueNodes(res.aggregations(), m -> {
          // using 'Collectors.toMap' will trigger Java 8 bug here
          Map<String, Object> newMap = new LinkedHashMap<>();
          for (String key: m.keySet()) {
            newMap.put(fieldMap.getOrDefault(key, key), m.get(key));
          }
          result.add(newMap);
        });
      } else {
        // probably no group by. add single result
        result.add(new LinkedHashMap<>());
      }
      

      The second time is added again in the following code:

      // elastic exposes total number of documents matching a query in "/hits/total" path
      // this can be used for simple "select count(*) from table"
      final long total = res.searchHits().total().value();
      
      if (groupBy.isEmpty()) {
        // put totals automatically for count(*) expression(s), unless they contain group by
        for (String expr : countAll) {
          result.forEach(m -> m.put(expr, total));
        }
      }
      

      Although such operations have no effect on the final result when there are multiple aggregate functions, it is obvious that they can be optimized.TotalHit is added to the result set only when groupby. isEmpty() and aggregation. size() == 1 simultaneously satisfy:

      if (groupBy.isEmpty() && aggregations.size() == 1) {
         ...
      }
      

      This avoids repeatedly adding count * to the result set.

       

       

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              xpleaf yeyonghao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m