Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
4.2.0
-
None
-
None
Description
For compliance and disk space reasons there are use cases where we users need to provide a strong guarantee that Phoenix will not spool data to disk across a heterogeneous set of query patterns.
Currently all scans run through the SpoolingResultIterator and in the constructor we do the following as part of delegating to the underlying iterators that do the scan:
DeferredFileOutputStream spoolTo = new DeferredFileOutputStream(size, tempFile) { @Override protected void thresholdReached() throws IOException { super.thresholdReached(); chunk.close(); } }; DataOutputStream out = new DataOutputStream(spoolTo); final long maxBytesAllowed = maxSpoolToDisk == -1 ? Long.MAX_VALUE : thresholdBytes + maxSpoolToDisk; long bytesWritten = 0L; int maxSize = 0; for (Tuple result = scanner.next(); result != null; result = scanner.next()) { int length = TupleUtil.write(result, out); bytesWritten += length; if(bytesWritten > maxBytesAllowed){ throw new SpoolTooBigToDiskException("result too big, max allowed(bytes): " + maxBytesAllowed); } maxSize = Math.max(length, maxSize); }
We always go through the Spooling iterator and looking at the code it looks like that even if we configure the spool size to 0 we only check after we have written the data to the DataOutputStream which could result in a spool file being written.
I think it would be much more straightforward if we:
a) Had a simple boolean configuration that would allow us to disable spooling
b) If this config disables spooling we bypass the spooling iterator and the above logic