Uploaded image for project: 'Apache AsterixDB'
  1. Apache AsterixDB
  2. ASTERIXDB-2326

Cannot run aggregation functions when the external dataset size grows too large

    XMLWordPrintableJSON

Details

    Description

      I was testing aggregation functions on external data, and found that the aggregation functions would not work at all at 100 million tuples. At 10million tuples, the aggregates worked. None of the existing aggregates or the aggregates I am adding will work for 100 million tuples. 

      DDL:

      DROP DATAVERSE AGG_TEST IF EXISTS;
      CREATE DATAVERSE AGG_TEST;
      USE AGG_TEST;

      CREATE TYPE Data AS

      { id: int, val: double }

      ;

      create external dataset dataval(Data) using localfs((`path`=`127.0.0.1://Users/name/Documents/100000000.txt`),(`format`=`adm`));

       

      Query:

      USE AGG_TEST;

      {"average":coll_avg((select element x.val from dataval as x))}

      ;

       

      Error:
      11:55:25.603 [Executor-3:ClusterController] INFO  org.apache.asterix.runtime.utils.ClusterStateManager - Cluster State is now ACTIVE
      11:55:30.447 [Worker:ClusterController] INFO  org.apache.hyracks.control.common.work.WorkQueue - Executing: GetDatasetDirectoryServiceInfo
      11:55:30.917 [Worker:ClusterController] INFO  org.apache.hyracks.control.common.work.WorkQueue - Executing: GetNodeControllersInfo
      11:55:31.345 [Worker:ClusterController] INFO  org.apache.hyracks.control.common.work.WorkQueue - Executing: JobStart
      11:55:31.379 [Worker:ClusterController] INFO  org.apache.hyracks.control.cc.dataset.DatasetDirectoryService - DatasetDirectoryService notified of new job JID:0.1
      11:55:31.382 [Worker:ClusterController] INFO  org.apache.asterix.app.active.ActiveNotificationHandler - notifyJobCreation(JobId jobId, JobSpecification jobSpecification) was called with jobId = JID:0.1
      11:55:31.382 [Worker:ClusterController] INFO  org.apache.asterix.app.active.ActiveNotificationHandler - Job is not of type active job. property found to be: null
      11:55:31.393 [Worker:ClusterController] INFO  org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Plan for org.apache.hyracks.api.job.ActivityCluster@1264c6ff
      11:55:31.393 [Worker:ClusterController] INFO  org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Built 1 Task Clusters
      11:55:31.393 [Worker:ClusterController] INFO  org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Tasks: [TID:ANID:ODID:0:0:0, TID:ANID:ODID:2:0:0]
      11:55:31.394 [Worker:ClusterController] INFO  org.apache.hyracks.control.cc.executor.JobExecutor - Runnable TC roots: [TC:[TID:ANID:ODID:0:0:0, TID:ANID:ODID:2:0:0]], inProgressTaskClusters: []
      11:55:31.412 [Worker:ClusterController] INFO  org.apache.hyracks.control.common.work.WorkQueue - Executing: WaitForJobCompletion
      11:55:31.412 [Worker:asterix_nc1] INFO  org.apache.hyracks.control.common.work.WorkQueue - Executing: StartTasks
      11:55:31.423 [Worker:asterix_nc1] INFO  org.apache.hyracks.control.nc.work.StartTasksWork - Initializing TAID:TID:ANID:ODID:0:0:0:0 -> [org.apache.asterix.external.operators.ExternalScanOperatorDescriptor@74fb82e0, AlgebricksMeta [assign [1] := [org.apache.asterix.runtime.evaluators.functions.records.FieldAccessByIndexEvalFactory$_EvaluatorFactoryGen@30d487a5], stream-project [1], assign [org.apache.asterix.runtime.aggregates.std.LocalAvgAggregateDescriptor$2@6594e4ce]]] for JID:0.1
      11:55:31.450 [Worker:asterix_nc1] INFO  org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: CDID:1
      11:55:31.453 [Worker:asterix_nc1] INFO  org.apache.hyracks.control.nc.work.StartTasksWork - Initializing TAID:TID:ANID:ODID:2:0:0:0 -> [org.apache.hyracks.dataflow.std.result.ResultWriterOperatorDescriptor@71b17102, AlgebricksMeta [assign [org.apache.asterix.runtime.aggregates.std.GlobalAvgAggregateDescriptor$2@11121dfc], assign [1] := [org.apache.asterix.runtime.evaluators.common.ClosedRecordConstructorEvalFactory@443a919b], stream-project [1]]] for JID:0.1
      11:55:31.480 [Worker:asterix_nc1] INFO  org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: CDID:1
      11:55:31.517 [org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:2:0:0:0:0] INFO  org.apache.hyracks.control.nc.dataset.DatasetPartitionWriter - open(0)
      12:00:57.342 [Worker:asterix_nc1] INFO  org.apache.hyracks.control.common.work.WorkQueue - Executing: NotifyTaskCompleteWork:TAID:TID:ANID:ODID:0:0:0:0
      12:00:57.351 [Worker:ClusterController] INFO  org.apache.hyracks.control.common.work.WorkQueue - Executing: TaskComplete: [asterix_nc1[JID:0.1:TAID:TID:ANID:ODID:0:0:0:0]
      12:00:57.365 [Worker:ClusterController] INFO  org.apache.hyracks.control.common.work.WorkQueue - Executing: RegisterResultPartitionLocation: JobId@JID:0.1 ResultSetId@RSID:0 Partition@0 NPartitions@1 ResultPartitionLocation@127.0.0.1:49695 OrderedResult@true EmptyResult@false
      12:00:57.368 [org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:2:0:0:0:0] INFO  org.apache.hyracks.control.nc.dataset.DatasetPartitionWriter - close(0)
      12:00:57.373 [Worker:asterix_nc1] INFO  org.apache.hyracks.control.common.work.WorkQueue - Executing: NotifyTaskCompleteWork:TAID:TID:ANID:ODID:2:0:0:0
      12:00:57.377 [Worker:ClusterController] WARN  org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork - Failed to register partition location
      org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set for job JID:0.1
      at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55) ~[classes/:?]
      at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105) ~[classes/:?]
      at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114) ~[classes/:?]
      at org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71) [classes/:?]
      at org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127) [classes/:?]
      12:00:57.393 [Worker:ClusterController] INFO  org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for job: JID:0.1: {asterix_nc1=[TAID:TID:ANID:ODID:2:0:0:0]}
      12:00:57.394 [Worker:ClusterController] INFO  org.apache.hyracks.control.cc.executor.JobExecutor - Aborting: [TAID:TID:ANID:ODID:2:0:0:0] at asterix_nc1
      12:00:57.400 [Worker:ClusterController] INFO  org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing uncommitted partitions: []
      12:00:57.405 [Worker:ClusterController] INFO  org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing partition requests: []
      12:00:57.407 [Worker:ClusterController] INFO  org.apache.hyracks.control.common.work.WorkQueue - Executing: ReportResultPartitionWriteCompletion: JobId@JID:0.1 ResultSetId@RSID:0 Partition@0
      12:00:57.407 [Worker:asterix_nc1] INFO  org.apache.hyracks.control.common.work.WorkQueue - Executing: AbortTasks
      12:00:57.407 [Worker:asterix_nc1] INFO  org.apache.hyracks.control.nc.work.AbortTasksWork - Aborting Tasks: JID:0.1:[TAID:TID:ANID:ODID:2:0:0:0]
      12:00:57.407 [Worker:ClusterController] WARN  org.apache.hyracks.control.common.work.WorkQueue - Exception while executing ReportResultPartitionWriteCompletion: JobId@JID:0.1 ResultSetId@RSID:0 Partition@0
      java.lang.RuntimeException: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set for job JID:0.1
      at org.apache.hyracks.control.cc.work.ReportResultPartitionWriteCompletionWork.run(ReportResultPartitionWriteCompletionWork.java:49) ~[classes/:?]
      at org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127) [classes/:?]
      Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set for job JID:0.1
      at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55) ~[classes/:?]
      at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105) ~[classes/:?]
      at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.reportResultPartitionWriteCompletion(DatasetDirectoryService.java:141) ~[classes/:?]
      at org.apache.hyracks.control.cc.work.ReportResultPartitionWriteCompletionWork.run(ReportResultPartitionWriteCompletionWork.java:47) ~[classes/:?]
      ... 1 more
      12:00:57.408 [Worker:ClusterController] INFO  org.apache.hyracks.control.common.work.WorkQueue - Executing: TaskComplete: [asterix_nc1[JID:0.1:TAID:TID:ANID:ODID:2:0:0:0]
      12:00:57.409 [Worker:ClusterController] WARN  org.apache.hyracks.control.cc.executor.JobExecutor - Spurious task complete notification: TAID:TID:ANID:ODID:2:0:0:0 Current state = ABORTED
      12:00:57.409 [Worker:ClusterController] INFO  org.apache.hyracks.control.common.work.WorkQueue - Executing: JobCleanup: JobId@JID:0.1 Status@FAILURE Exceptions@[org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set for job JID:0.1]
      12:00:57.409 [Worker:ClusterController] INFO  org.apache.hyracks.control.cc.work.JobCleanupWork - Cleanup for JobRun with id: JID:0.1
      12:00:57.412 [Worker:asterix_nc1] INFO  org.apache.hyracks.control.common.work.WorkQueue - Executing: CleanupJoblet
      12:00:57.413 [Worker:asterix_nc1] INFO  org.apache.hyracks.control.nc.work.CleanupJobletWork - Cleaning up after job: JID:0.1
      12:00:57.416 [Worker:asterix_nc1] INFO  org.apache.hyracks.control.nc.Joblet - Freeing leaked 294912 bytes
      12:00:57.421 [Worker:ClusterController] INFO  org.apache.hyracks.control.common.work.WorkQueue - Executing: JobletCleanupNotification
      12:00:57.421 [Worker:ClusterController] INFO  org.apache.asterix.app.active.ActiveNotificationHandler - Getting notified of job finish for JobId: JID:0.1
      12:00:57.421 [Worker:ClusterController] INFO  org.apache.asterix.app.active.ActiveNotificationHandler - NO NEED TO NOTIFY JOB FINISH!
      12:00:57.430 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:49684]] INFO  org.apache.hyracks.ipc.impl.IPCSystem - Exception in message
      org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set for job JID:0.1
      at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55) ~[classes/:?]
      at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105) ~[classes/:?]
      at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114) ~[classes/:?]
      at org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71) ~[classes/:?]
      at org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127) ~[classes/:?]
      12:00:57.436 [HttpExecutor(port:19001)-0] ERROR org.apache.asterix - HYR0024: No result set for job JID:0.1
      org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set for job JID:0.1
      at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55) ~[classes/:?]
      at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105) ~[classes/:?]
      at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114) ~[classes/:?]
      at org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71) ~[classes/:?]
      at org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127) ~[classes/:?]
      12:00:57.442 [Worker:ClusterController] WARN  org.apache.hyracks.control.common.work.WorkQueue - Work JobletCleanupNotification waited 0 times (~0ms), blocked 1 times (~0ms)

      Attachments

        Activity

          People

            mhubail Murtadha Makki Al Hubail
            James Fang James Fang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: