Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
I was testing aggregation functions on external data, and found that the aggregation functions would not work at all at 100 million tuples. At 10million tuples, the aggregates worked. None of the existing aggregates or the aggregates I am adding will work for 100 million tuples.
DDL:
DROP DATAVERSE AGG_TEST IF EXISTS;
CREATE DATAVERSE AGG_TEST;
USE AGG_TEST;
CREATE TYPE Data AS
{ id: int, val: double };
create external dataset dataval(Data) using localfs((`path`=`127.0.0.1://Users/name/Documents/100000000.txt`),(`format`=`adm`));
Query:
USE AGG_TEST;
{"average":coll_avg((select element x.val from dataval as x))};
Error:
11:55:25.603 [Executor-3:ClusterController] INFO org.apache.asterix.runtime.utils.ClusterStateManager - Cluster State is now ACTIVE
11:55:30.447 [Worker:ClusterController] INFO org.apache.hyracks.control.common.work.WorkQueue - Executing: GetDatasetDirectoryServiceInfo
11:55:30.917 [Worker:ClusterController] INFO org.apache.hyracks.control.common.work.WorkQueue - Executing: GetNodeControllersInfo
11:55:31.345 [Worker:ClusterController] INFO org.apache.hyracks.control.common.work.WorkQueue - Executing: JobStart
11:55:31.379 [Worker:ClusterController] INFO org.apache.hyracks.control.cc.dataset.DatasetDirectoryService - DatasetDirectoryService notified of new job JID:0.1
11:55:31.382 [Worker:ClusterController] INFO org.apache.asterix.app.active.ActiveNotificationHandler - notifyJobCreation(JobId jobId, JobSpecification jobSpecification) was called with jobId = JID:0.1
11:55:31.382 [Worker:ClusterController] INFO org.apache.asterix.app.active.ActiveNotificationHandler - Job is not of type active job. property found to be: null
11:55:31.393 [Worker:ClusterController] INFO org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Plan for org.apache.hyracks.api.job.ActivityCluster@1264c6ff
11:55:31.393 [Worker:ClusterController] INFO org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Built 1 Task Clusters
11:55:31.393 [Worker:ClusterController] INFO org.apache.hyracks.control.cc.executor.ActivityClusterPlanner - Tasks: [TID:ANID:ODID:0:0:0, TID:ANID:ODID:2:0:0]
11:55:31.394 [Worker:ClusterController] INFO org.apache.hyracks.control.cc.executor.JobExecutor - Runnable TC roots: [TC:[TID:ANID:ODID:0:0:0, TID:ANID:ODID:2:0:0]], inProgressTaskClusters: []
11:55:31.412 [Worker:ClusterController] INFO org.apache.hyracks.control.common.work.WorkQueue - Executing: WaitForJobCompletion
11:55:31.412 [Worker:asterix_nc1] INFO org.apache.hyracks.control.common.work.WorkQueue - Executing: StartTasks
11:55:31.423 [Worker:asterix_nc1] INFO org.apache.hyracks.control.nc.work.StartTasksWork - Initializing TAID:TID:ANID:ODID:0:0:0:0 -> [org.apache.asterix.external.operators.ExternalScanOperatorDescriptor@74fb82e0, AlgebricksMeta [assign [1] := [org.apache.asterix.runtime.evaluators.functions.records.FieldAccessByIndexEvalFactory$_EvaluatorFactoryGen@30d487a5], stream-project [1], assign [org.apache.asterix.runtime.aggregates.std.LocalAvgAggregateDescriptor$2@6594e4ce]]] for JID:0.1
11:55:31.450 [Worker:asterix_nc1] INFO org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: CDID:1
11:55:31.453 [Worker:asterix_nc1] INFO org.apache.hyracks.control.nc.work.StartTasksWork - Initializing TAID:TID:ANID:ODID:2:0:0:0 -> [org.apache.hyracks.dataflow.std.result.ResultWriterOperatorDescriptor@71b17102, AlgebricksMeta [assign [org.apache.asterix.runtime.aggregates.std.GlobalAvgAggregateDescriptor$2@11121dfc], assign [1] := [org.apache.asterix.runtime.evaluators.common.ClosedRecordConstructorEvalFactory@443a919b], stream-project [1]]] for JID:0.1
11:55:31.480 [Worker:asterix_nc1] INFO org.apache.hyracks.control.nc.work.StartTasksWork - input: 0: CDID:1
11:55:31.517 [org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:2:0:0:0:0] INFO org.apache.hyracks.control.nc.dataset.DatasetPartitionWriter - open(0)
12:00:57.342 [Worker:asterix_nc1] INFO org.apache.hyracks.control.common.work.WorkQueue - Executing: NotifyTaskCompleteWork:TAID:TID:ANID:ODID:0:0:0:0
12:00:57.351 [Worker:ClusterController] INFO org.apache.hyracks.control.common.work.WorkQueue - Executing: TaskComplete: [asterix_nc1[JID:0.1:TAID:TID:ANID:ODID:0:0:0:0]
12:00:57.365 [Worker:ClusterController] INFO org.apache.hyracks.control.common.work.WorkQueue - Executing: RegisterResultPartitionLocation: JobId@JID:0.1 ResultSetId@RSID:0 Partition@0 NPartitions@1 ResultPartitionLocation@127.0.0.1:49695 OrderedResult@true EmptyResult@false
12:00:57.368 [org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:2:0:0:0:0] INFO org.apache.hyracks.control.nc.dataset.DatasetPartitionWriter - close(0)
12:00:57.373 [Worker:asterix_nc1] INFO org.apache.hyracks.control.common.work.WorkQueue - Executing: NotifyTaskCompleteWork:TAID:TID:ANID:ODID:2:0:0:0
12:00:57.377 [Worker:ClusterController] WARN org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork - Failed to register partition location
org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set for job JID:0.1
at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55) ~[classes/:?]
at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105) ~[classes/:?]
at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114) ~[classes/:?]
at org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71) [classes/:?]
at org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127) [classes/:?]
12:00:57.393 [Worker:ClusterController] INFO org.apache.hyracks.control.cc.executor.JobExecutor - Abort map for job: JID:0.1: {asterix_nc1=[TAID:TID:ANID:ODID:2:0:0:0]}
12:00:57.394 [Worker:ClusterController] INFO org.apache.hyracks.control.cc.executor.JobExecutor - Aborting: [TAID:TID:ANID:ODID:2:0:0:0] at asterix_nc1
12:00:57.400 [Worker:ClusterController] INFO org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing uncommitted partitions: []
12:00:57.405 [Worker:ClusterController] INFO org.apache.hyracks.control.cc.partitions.PartitionMatchMaker - Removing partition requests: []
12:00:57.407 [Worker:ClusterController] INFO org.apache.hyracks.control.common.work.WorkQueue - Executing: ReportResultPartitionWriteCompletion: JobId@JID:0.1 ResultSetId@RSID:0 Partition@0
12:00:57.407 [Worker:asterix_nc1] INFO org.apache.hyracks.control.common.work.WorkQueue - Executing: AbortTasks
12:00:57.407 [Worker:asterix_nc1] INFO org.apache.hyracks.control.nc.work.AbortTasksWork - Aborting Tasks: JID:0.1:[TAID:TID:ANID:ODID:2:0:0:0]
12:00:57.407 [Worker:ClusterController] WARN org.apache.hyracks.control.common.work.WorkQueue - Exception while executing ReportResultPartitionWriteCompletion: JobId@JID:0.1 ResultSetId@RSID:0 Partition@0
java.lang.RuntimeException: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set for job JID:0.1
at org.apache.hyracks.control.cc.work.ReportResultPartitionWriteCompletionWork.run(ReportResultPartitionWriteCompletionWork.java:49) ~[classes/:?]
at org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127) [classes/:?]
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set for job JID:0.1
at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55) ~[classes/:?]
at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105) ~[classes/:?]
at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.reportResultPartitionWriteCompletion(DatasetDirectoryService.java:141) ~[classes/:?]
at org.apache.hyracks.control.cc.work.ReportResultPartitionWriteCompletionWork.run(ReportResultPartitionWriteCompletionWork.java:47) ~[classes/:?]
... 1 more
12:00:57.408 [Worker:ClusterController] INFO org.apache.hyracks.control.common.work.WorkQueue - Executing: TaskComplete: [asterix_nc1[JID:0.1:TAID:TID:ANID:ODID:2:0:0:0]
12:00:57.409 [Worker:ClusterController] WARN org.apache.hyracks.control.cc.executor.JobExecutor - Spurious task complete notification: TAID:TID:ANID:ODID:2:0:0:0 Current state = ABORTED
12:00:57.409 [Worker:ClusterController] INFO org.apache.hyracks.control.common.work.WorkQueue - Executing: JobCleanup: JobId@JID:0.1 Status@FAILURE Exceptions@[org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set for job JID:0.1]
12:00:57.409 [Worker:ClusterController] INFO org.apache.hyracks.control.cc.work.JobCleanupWork - Cleanup for JobRun with id: JID:0.1
12:00:57.412 [Worker:asterix_nc1] INFO org.apache.hyracks.control.common.work.WorkQueue - Executing: CleanupJoblet
12:00:57.413 [Worker:asterix_nc1] INFO org.apache.hyracks.control.nc.work.CleanupJobletWork - Cleaning up after job: JID:0.1
12:00:57.416 [Worker:asterix_nc1] INFO org.apache.hyracks.control.nc.Joblet - Freeing leaked 294912 bytes
12:00:57.421 [Worker:ClusterController] INFO org.apache.hyracks.control.common.work.WorkQueue - Executing: JobletCleanupNotification
12:00:57.421 [Worker:ClusterController] INFO org.apache.asterix.app.active.ActiveNotificationHandler - Getting notified of job finish for JobId: JID:0.1
12:00:57.421 [Worker:ClusterController] INFO org.apache.asterix.app.active.ActiveNotificationHandler - NO NEED TO NOTIFY JOB FINISH!
12:00:57.430 [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:49684]] INFO org.apache.hyracks.ipc.impl.IPCSystem - Exception in message
org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set for job JID:0.1
at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55) ~[classes/:?]
at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105) ~[classes/:?]
at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114) ~[classes/:?]
at org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71) ~[classes/:?]
at org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127) ~[classes/:?]
12:00:57.436 [HttpExecutor(port:19001)-0] ERROR org.apache.asterix - HYR0024: No result set for job JID:0.1
org.apache.hyracks.api.exceptions.HyracksDataException: HYR0024: No result set for job JID:0.1
at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:55) ~[classes/:?]
at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.getNonNullDatasetJobRecord(DatasetDirectoryService.java:105) ~[classes/:?]
at org.apache.hyracks.control.cc.dataset.DatasetDirectoryService.registerResultPartitionLocation(DatasetDirectoryService.java:114) ~[classes/:?]
at org.apache.hyracks.control.cc.work.RegisterResultPartitionLocationWork.run(RegisterResultPartitionLocationWork.java:71) ~[classes/:?]
at org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127) ~[classes/:?]
12:00:57.442 [Worker:ClusterController] WARN org.apache.hyracks.control.common.work.WorkQueue - Work JobletCleanupNotification waited 0 times (~0ms), blocked 1 times (~0ms)