Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-1465

Provide a configuration option to disable spooling query results to disk

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 4.2.0
    • None
    • None

    Description

      For compliance and disk space reasons there are use cases where we users need to provide a strong guarantee that Phoenix will not spool data to disk across a heterogeneous set of query patterns.

      Currently all scans run through the SpoolingResultIterator and in the constructor we do the following as part of delegating to the underlying iterators that do the scan:

      DeferredFileOutputStream spoolTo = new DeferredFileOutputStream(size, tempFile) {
                      @Override
                      protected void thresholdReached() throws IOException {
                          super.thresholdReached();
                          chunk.close();
                      }
                  };
                  DataOutputStream out = new DataOutputStream(spoolTo);
                  final long maxBytesAllowed = maxSpoolToDisk == -1 ? 
                  		Long.MAX_VALUE : thresholdBytes + maxSpoolToDisk;
                  long bytesWritten = 0L;
                  int maxSize = 0;
                  for (Tuple result = scanner.next(); result != null; result = scanner.next()) {
                      int length = TupleUtil.write(result, out);
                      bytesWritten += length;
                      if(bytesWritten > maxBytesAllowed){
                      		throw new SpoolTooBigToDiskException("result too big, max allowed(bytes): " + maxBytesAllowed);
                      }
                      maxSize = Math.max(length, maxSize);
                  }
      

      We always go through the Spooling iterator and looking at the code it looks like that even if we configure the spool size to 0 we only check after we have written the data to the DataOutputStream which could result in a spool file being written.

      I think it would be much more straightforward if we:
      a) Had a simple boolean configuration that would allow us to disable spooling
      b) If this config disables spooling we bypass the spooling iterator and the above logic

      Attachments

        Activity

          People

            Unassigned Unassigned
            jfernando_sfdc Jan Fernando
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: