Uploaded image for project: 'Apache AsterixDB'
  1. Apache AsterixDB
  2. ASTERIXDB-2483

Out of Memory error doing aggregation - need a rewrite

    XMLWordPrintableJSON

Details

    Description

      This is the schema:

      CREATE TYPE Test AS open { unique2: int64 };
      
      CREATE DATASET wisconsin_5gb(Test) PRIMARY KEY unique2;
      

      This is the query:

      SELECT
          min(t.oddOnePercent) as min, 
          max(t.oddOnePercent) as max, 
          count(distinct t.oddOnePercent) as cnt
      FROM wisconsin_5gb t;
      

      The plan for this query:

      distribute result [$$46]
      -- DISTRIBUTE_RESULT  |UNPARTITIONED|
        exchange
        -- ONE_TO_ONE_EXCHANGE  |UNPARTITIONED|
          project ([$$46])
          -- STREAM_PROJECT  |UNPARTITIONED|
            assign [$$46] <- [{"min": $$48, "max": $$49, "cnt": $$50}]
            -- ASSIGN  |UNPARTITIONED|
              project ([$$48, $$49, $$50])
              -- STREAM_PROJECT  |UNPARTITIONED|
                subplan {
                          aggregate [$$50] <- [agg-sql-sum($$53)]
                          -- AGGREGATE  |LOCAL|
                            aggregate [$$53] <- [agg-sql-count($$43)]
                            -- AGGREGATE  |LOCAL|
                              distinct ([$$43])
                              -- MICRO_PRE_SORTED_DISTINCT_BY  |LOCAL|
                                order (ASC, $$43) 
                                -- IN_MEMORY_STABLE_SORT [$$43(ASC)]  |LOCAL|
                                  assign [$$43] <- [$$52.getField("oddOnePercent")]
                                  -- ASSIGN  |UNPARTITIONED|
                                    assign [$$52] <- [$#4.getField(0)]
                                    -- ASSIGN  |UNPARTITIONED|
                                      unnest $#4 <- scan-collection($$28)
                                      -- UNNEST  |UNPARTITIONED|
                                        nested tuple source
                                        -- NESTED_TUPLE_SOURCE  |UNPARTITIONED|
                       }
                -- SUBPLAN  |UNPARTITIONED|
                  aggregate [$$28, $$48, $$49] <- [listify($$27), agg-sql-min($$33), agg-sql-max($$33)]
                  -- AGGREGATE  |UNPARTITIONED|
                    exchange
                    -- RANDOM_MERGE_EXCHANGE  |PARTITIONED|
                      project ([$$27, $$33])
                      -- STREAM_PROJECT  |PARTITIONED|
                        assign [$$33, $$27] <- [$$t.getField("oddOnePercent"), {"t": $$t}]
                        -- ASSIGN  |PARTITIONED|
                          project ([$$t])
                          -- STREAM_PROJECT  |PARTITIONED|
                            exchange
                            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                              data-scan []<-[$$47, $$t] <- Default.wisconsin_5gb
                              -- DATASOURCE_SCAN  |PARTITIONED|
                                exchange
                                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                                  empty-tuple-source
                                  -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
      

      Attachments

        Issue Links

          Activity

            People

              dlychagin-cb Dmitry Lychagin
              dtabass Michael J. Carey
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: