Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3318

SUM(CAST(col as INT)) shows different results when used in window functions

    XMLWordPrintableJSON

Details

    Description

      I have a parquet file that contains an INT field with large enough values so that the sum of the values overflows the INT limits, and a VARCHAR column with one single value (to force all rows to be part of the same partition).

      Computing the sums without casting will give similar results:

      SELECT SUM(col_int) OVER(PARTITION BY col_char) FROM `3278.parquet` LIMIT 1;
      +--------------+
      |    EXPR$0    |
      +--------------+
      | -3216087191  |
      +--------------+
      
      SELECT SUM(col_int) FROM `3278.parquet`;
      +--------------+
      |    EXPR$0    |
      +--------------+
      | -3216087191  |
      +--------------+
      

      But if we cast the column before doing the sum, the results are now different:

      SELECT SUM(CAST(col_int AS INT)) OVER(PARTITION BY col_char) FROM `3278.parquet` LIMIT 1;
      +-------------+
      |   EXPR$0    |
      +-------------+
      | 1078880105  |
      +-------------+
      
      SELECT SUM(CAST(col_int AS INT)) FROM `3278.parquet`;
      +--------------+
      |    EXPR$0    |
      +--------------+
      | -3216087191  |
      +--------------+
      

      Attachments

        1. DRILL-3318.patch
          2 kB
          Mehant Baid
        2. 3278.parquet
          0.4 kB
          Abdel Hakim Deneche

        Activity

          People

            mehant Mehant Baid
            adeneche Abdel Hakim Deneche
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: