Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12339

[Rust][DataFusion] COUNT DISTINCT does not support for `Boolean`

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Invalid
    • None
    • None
    • Rust - DataFusion
    • None

    Description

      If you try to run a `COUNT (DISTINCT ..)` query on a boolean column you get the following panic:

      thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', datafusion/src/scalar.rs:342:22

      While there is unlikely to be a big usecase for this, it would be nice for completeness sake. At the very least we should add a proper error message rather than a panic

      Reproducer:

      echo "true" > /tmp/foo.csv
       ./target/debug/datafusion-cli
      
      > CREATE EXTERNAL TABLE t (a boolean) STORED AS CSV LOCATION '/tmp/foo.csv';
      
      0 rows in set. Query took 0 seconds.
      > select count(distinct a) from t;
      
      thread 'tokio-runtime-worker' panicked at 'Unexpected DataType for list', datafusion/src/scalar.rs:342:22
      note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
      ArrowError(ExternalError(Canceled))
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              alamb Andrew Lamb
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: