Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Reproducer:
Create a table with a string column:
Repro:
CREATE EXTERNAL TABLE repro(a INT, b VARCHAR)
STORED AS CSV
WITH HEADER ROW
LOCATION 'repro.csv';
The contents of repro.csv are as follows (also attached):
a,b 1,One 1,Two 2,One 2,Two 2,Two
Now, run a query that tries to aggregate that column:
select a, count(b) from repro group by a;
Actual behavior:
> select a, count(b) from repro group by a;
ArrowError(ExternalError(ExecutionError("Unsupported data type Utf8 for result of aggregate expression")))
Expected Behavior:
The query runs and produces results
a, count(b) 1,2 2,3
Discussion
Using Min/Max aggregates on varchar also doesn't work (but should):
> select a, min(b) from repro group by a; ArrowError(ExternalError(ExecutionError("Unsupported data type Utf8 for result of aggregate expression"))) > select a, max(b) from repro group by a; ArrowError(ExternalError(ExecutionError("Unsupported data type Utf8 for result of aggregate expression")))
Fascinatingly these formulations work fine:
> select a, count(a) from repro group by a; +---+----------+ | a | count(a) | +---+----------+ | 2 | 3 | | 1 | 2 | +---+----------+ 2 row in set. Query took 0 seconds. > select a, count(1) from repro group by a; +---+-----------------+ | a | count(UInt8(1)) | +---+-----------------+ | 2 | 3 | | 1 | 2 | +---+-----------------+ 2 row in set. Query took 0 seconds.
Attachments
Attachments
Issue Links
- links to