Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12312

[Rust][DataFusion] COUNT DISTINCT does not support for `Float64`

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Invalid
    • None
    • None
    • Rust - DataFusion
    • None

    Description

      If you try to run a `COUNT (DISTINCT ..)` query on a float column you get the following error:

      thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', datafusion/src/scalar.rs:342:22

      Reproducer:

       echo "foo,1.23" > /tmp/foo.csv
       ./target/debug/datafusion-cli
      
      > CREATE EXTERNAL TABLE t (a varchar, b float) STORED AS CSV LOCATION '/tmp/foo.csv';
      0 rows in set. Query took 0 seconds.
      > select count(distinct a) from t;
      +-------------------+
      | COUNT(DISTINCT a) |
      +-------------------+
      | 1                 |
      +-------------------+
      1 rows in set. Query took 0 seconds.
      > select count(distinct b) from t;
      thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', datafusion/src/scalar.rs:342:22
      note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
      ArrowError(ExternalError(Canceled))
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              alamb Andrew Lamb
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: