[CASSANDRA-12417] Built-in AVG aggregate is much less useful than it should be - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Fixed
Fix Version/s: 3.0.10, 3.10
Component/s: Legacy/CQL
Labels:
None

Severity:
Normal

Description

For fixed-size integer types overflow is all but guaranteed to happen, yielding incorrect result. While for sum it is somewhat acceptable as the result cannot fit the type, this is not the case for average.

As the result of average is always within the scope of the source type, failing to produce it only signifies a bad implementation. Yes, one can solve this by type-casting, but do we really want to always have to be telling people that the correct spelling of the average function is cast(avg(cast(value as bigint))) as int), especially if this is so trivial to fix?

Additionally, the straightforward addition we use for floating point versions is not a good choice numerically for larger numbers of values. We should switch to a more stable version, e.g. iterative mean using avg = avg + (value - avg) / count.

Attachments

Issue Links

relates to

CASSANDRA-4914 Aggregation functions in CQL

Resolved

CASSANDRA-9674 Reevaluate size of result/accumulator types of built in sum()+avg() functions

Resolved

Activity

People

Assignee:: Alex Petrov

Reporter:: Branimir Lambov

Authors:: Alex Petrov

Reviewers:: Branimir Lambov

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 09/Aug/16 09:24

Updated:: 16/Apr/19 09:30

Resolved:: 17/Oct/16 18:01