Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3318

SUM(CAST(col as INT)) shows different results when used in window functions

    XMLWordPrintableJSON

    Details

      Description

      I have a parquet file that contains an INT field with large enough values so that the sum of the values overflows the INT limits, and a VARCHAR column with one single value (to force all rows to be part of the same partition).

      Computing the sums without casting will give similar results:

      SELECT SUM(col_int) OVER(PARTITION BY col_char) FROM `3278.parquet` LIMIT 1;
      +--------------+
      |    EXPR$0    |
      +--------------+
      | -3216087191  |
      +--------------+
      
      SELECT SUM(col_int) FROM `3278.parquet`;
      +--------------+
      |    EXPR$0    |
      +--------------+
      | -3216087191  |
      +--------------+
      

      But if we cast the column before doing the sum, the results are now different:

      SELECT SUM(CAST(col_int AS INT)) OVER(PARTITION BY col_char) FROM `3278.parquet` LIMIT 1;
      +-------------+
      |   EXPR$0    |
      +-------------+
      | 1078880105  |
      +-------------+
      
      SELECT SUM(CAST(col_int AS INT)) FROM `3278.parquet`;
      +--------------+
      |    EXPR$0    |
      +--------------+
      | -3216087191  |
      +--------------+
      

        Attachments

        1. DRILL-3318.patch
          2 kB
          Mehant Baid
        2. 3278.parquet
          0.4 kB
          Abdel Hakim Deneche

          Activity

            People

            • Assignee:
              mehant Mehant Baid
              Reporter:
              adeneche Abdel Hakim Deneche
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: