Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9733

[Rust][DataFusion] Aggregates COUNT/MIN/MAX don't work on VARCHAR columns

    XMLWordPrintableJSON

Details

    Description

      Reproducer:

      Create a table with a string column:

      Repro:

      CREATE EXTERNAL TABLE repro(a INT, b VARCHAR)
      STORED AS CSV
      WITH HEADER ROW
      LOCATION 'repro.csv';
      

      The contents of repro.csv are as follows (also attached):

      a,b
      1,One
      1,Two
      2,One
      2,Two
      2,Two
      

      Now, run a query that tries to aggregate that column:

      select a, count(b) from repro group by a;
      

      Actual behavior:

      > select a, count(b) from repro group by a;
      ArrowError(ExternalError(ExecutionError("Unsupported data type Utf8 for result of aggregate expression")))
      

      Expected Behavior:
      The query runs and produces results

      a, count(b)
      1,2
      2,3
      

      Discussion

      Using Min/Max aggregates on varchar also doesn't work (but should):

      
      > select a, min(b) from repro group by a;
      ArrowError(ExternalError(ExecutionError("Unsupported data type Utf8 for result of aggregate expression")))
      > select a, max(b) from repro group by a;
      ArrowError(ExternalError(ExecutionError("Unsupported data type Utf8 for result of aggregate expression")))
      

      Fascinatingly these formulations work fine:

      > select a, count(a) from repro group by a;
      +---+----------+
      | a | count(a) |
      +---+----------+
      | 2 | 3        |
      | 1 | 2        |
      +---+----------+
      2 row in set. Query took 0 seconds.
      > select a, count(1) from repro group by a;
      +---+-----------------+
      | a | count(UInt8(1)) |
      +---+-----------------+
      | 2 | 3               |
      | 1 | 2               |
      +---+-----------------+
      2 row in set. Query took 0 seconds.
      

      Attachments

        1. repro.csv
          0.0 kB
          Andrew Lamb

        Issue Links

          Activity

            People

              jorgecarleitao Jorge Leitão
              alamb Andrew Lamb
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m