[SPARK-49018] Fix approx_count_distinct not working correctly with collation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 4.0.0
Fix Version/s: 4.0.0
Component/s: SQL
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/spark/pull/47503

Description

When running in spark-shell:

create table t(col string collate utf8_lcase)
insert into t values 'a', 'a', 'A'
select approx_count_distinct(col) from t

we get 2 as an answer, but it should be 1.

Attachments

Issue Links

links to

GitHub Pull Request #47503

Activity

People

Assignee:: Viktor Lučić

Reporter:: Viktor Lučić

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 26/Jul/24 13:12

Updated:: 05/Aug/24 12:56

Resolved:: 05/Aug/24 12:56