Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5946

[Rust] [DataFusion] Projection push down with aggregate producing incorrect results

    XMLWordPrintableJSON

Details

    Description

      I was testing some queries with the 0.14 release and noticed that the projected schema for a table scan is completely wrong (however the results of the query are not necessarily wrong)

       

      // schema for nyxtaxi csv files
      let schema = Schema::new(vec![
          Field::new("VendorID", DataType::Utf8, true),
          Field::new("tpep_pickup_datetime", DataType::Utf8, true),
          Field::new("tpep_dropoff_datetime", DataType::Utf8, true),
          Field::new("passenger_count", DataType::Utf8, true),
          Field::new("trip_distance", DataType::Float64, true),
          Field::new("RatecodeID", DataType::Utf8, true),
          Field::new("store_and_fwd_flag", DataType::Utf8, true),
          Field::new("PULocationID", DataType::Utf8, true),
          Field::new("DOLocationID", DataType::Utf8, true),
          Field::new("payment_type", DataType::Utf8, true),
          Field::new("fare_amount", DataType::Float64, true),
          Field::new("extra", DataType::Float64, true),
          Field::new("mta_tax", DataType::Float64, true),
          Field::new("tip_amount", DataType::Float64, true),
          Field::new("tolls_amount", DataType::Float64, true),
          Field::new("improvement_surcharge", DataType::Float64, true),
          Field::new("total_amount", DataType::Float64, true),
      ]);
      
      let mut ctx = ExecutionContext::new();
      ctx.register_csv("tripdata", "file.csv", &schema, true);
      
      let optimized_plan = ctx.create_logical_plan(
          "SELECT passenger_count, MIN(fare_amount), MAX(fare_amount) \
              FROM tripdata GROUP BY passenger_count").unwrap();

       The projected schema in the table scan has the first two columns from the schema (VendorID and tpetp_pickup_datetime) rather than passenger_count and fare_amount

      Attachments

        Issue Links

          Activity

            People

              andygrove Andy Grove
              andygrove Andy Grove
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h