Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3830

Query with aggregate window functions returns possibly wrong results on large scale data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • 1.2.0
    • None
    • None
    • 10 Performance Nodes
      DRILL_MAX_DIRECT_MEMORY=100g
      DRILL_INIT_HEAP="8g"
      DRILL_MAX_HEAP="8g"
      planner.memory.query_max_memory_per_node bumped up to 20 GB
      TPC-DS SF 1000 dataset (Parquet)

    Description

      Results returned by the following two queries slightly differ from those returned by Greenplum DB.

      SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) FROM store_sales ss LIMIT 1;
      
      SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk ORDER BY ss.ss_store_sk) FROM store_sales ss LIMIT 2;
      
      Drill:
      9.653697131700665E9
      
      Greenplum DB:
      9.628946925860903E9
      
      P.S. Both queries return same results
      

      I was unable to reproduce this on smaller scale (tried SF 1). I'll attach plans from both systems.

      Attachments

        1. gpdb_sf1000_plan.txt
          3 kB
          Abhishek Girish
        2. gpdb_sf1_plan.txt
          3 kB
          Abhishek Girish
        3. drill_sf1_plan.txt
          6 kB
          Abhishek Girish

        Activity

          People

            adeneche Abdel Hakim Deneche
            agirish Abhishek Girish
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: