Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.1.0
Description
I have a parquet file that contains an INT field with large enough values so that the sum of the values overflows the INT limits, and a VARCHAR column with one single value (to force all rows to be part of the same partition).
Computing the sums without casting will give similar results:
SELECT SUM(col_int) OVER(PARTITION BY col_char) FROM `3278.parquet` LIMIT 1; +--------------+ | EXPR$0 | +--------------+ | -3216087191 | +--------------+
SELECT SUM(col_int) FROM `3278.parquet`; +--------------+ | EXPR$0 | +--------------+ | -3216087191 | +--------------+
But if we cast the column before doing the sum, the results are now different:
SELECT SUM(CAST(col_int AS INT)) OVER(PARTITION BY col_char) FROM `3278.parquet` LIMIT 1; +-------------+ | EXPR$0 | +-------------+ | 1078880105 | +-------------+
SELECT SUM(CAST(col_int AS INT)) FROM `3278.parquet`; +--------------+ | EXPR$0 | +--------------+ | -3216087191 | +--------------+