Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3898

No space error during external sort does not cancel the query

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      While verifying DRILL-3732 I ran into a new problem.
      I think drill somehow loses track of out of disk exception and does not cancel rest of the query, which results in NPE:

      Reproduction is the same as in DRILL-3732:

      0: jdbc:drill:schema=dfs> create table store_sales_20(ss_item_sk, ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, s_sold_date_sk, ss_promo_sk) partition by (ss_promo_sk) as
      . . . . . . . . . . . . >  select 
      . . . . . . . . . . . . >      case when columns[2] = '' then cast(null as varchar(100)) else cast(columns[2] as varchar(100)) end,
      . . . . . . . . . . . . >      case when columns[3] = '' then cast(null as varchar(100)) else cast(columns[3] as varchar(100)) end,
      . . . . . . . . . . . . >      case when columns[4] = '' then cast(null as varchar(100)) else cast(columns[4] as varchar(100)) end, 
      . . . . . . . . . . . . >      case when columns[5] = '' then cast(null as varchar(100)) else cast(columns[5] as varchar(100)) end, 
      . . . . . . . . . . . . >      case when columns[0] = '' then cast(null as varchar(100)) else cast(columns[0] as varchar(100)) end, 
      . . . . . . . . . . . . >      case when columns[8] = '' then cast(null as varchar(100)) else cast(columns[8] as varchar(100)) end
      . . . . . . . . . . . . >  from 
      . . . . . . . . . . . . >           `store_sales.dat` ss     
      . . . . . . . . . . . . > ;
      Error: SYSTEM ERROR: NullPointerException
      Fragment 1:16
      [Error Id: 0ae9338d-d04f-4b4a-93aa-a80d13cedb29 on atsqa4-133.qa.lab:31010] (state=,code=0)
      

      This exception in drillbit.log should have triggered query cancellation:

      2015-10-06 17:01:34,463 [WorkManager-2] ERROR o.apache.drill.exec.work.WorkManager - org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.
      org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
              at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
              at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.7.0_71]
              at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[na:1.7.0_71]
              at java.io.FilterOutputStream.close(FilterOutputStream.java:157) ~[na:1.7.0_71]
              at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
              at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
              at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:400) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
              at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
              at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
              at org.apache.drill.exec.physical.impl.xsort.BatchGroup.close(BatchGroup.java:152) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:44) ~[drill-common-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:553) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:362) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:94) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:91) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_71]
              at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_71]
              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
              at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:252) ~[drill-java-exec-1.2.0.jar:1.2.0]
              at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) ~[drill-common-1.2.0.jar:1.2.0]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_71]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
              at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
      Caused by: java.io.IOException: No space left on device
              at java.io.FileOutputStream.writeBytes(Native Method) ~[na:1.7.0_71]
              at java.io.FileOutputStream.write(FileOutputStream.java:345) ~[na:1.7.0_71]
              at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:224) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
              ... 45 common frames omitted
      

      I'm attaching full drillbit.log

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ben-zvi Boaz Ben-Zvi
            vicky Victoria Markman
            Khurram Faraaz Khurram Faraaz
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment