AbfsInputStream.close() can trigger the return of buffers used for active prefetch GET requests into the ReadBufferManager free buffer pool.
A subsequent prefetch by a different stream in the same process may acquire this same buffer. This can lead to risk of corruption of its own prefetched data, data which may then be returned to that other thread.
The full analysis in in the document attached to this JIRA.
The issue is fixed in Hadoop 3.3.5
Emergency fix through site configuration
On releases without the fix for this (3.3.2-3.3.4), the bug can be avoided by disabling all prefetching
fs.azure.readaheadqueue.depth = 0
Automated probes for risk of exposure
The cloudstore diagnostics JAR has a command safeprefetch which probes an abfs client for being vulnerable. It does this through PathCapabilities.hasPathCapability() probes. It can be invoked on the command line to validate the version/configuration
Consult the source to see how to do this programmatically.
Note also that the tool's mkcsv command can be used to generate the multi-GB CSV files needed to trigger the condition and so verify that the issue exists.
From: Sneha Vijayarajan
Subject: RE: Alert ! ABFS Driver - Possible data corruption on read path
One of the contributions made to ABFS Driver has a potential to cause data corruption on read
Please check if the below change is part of any of your releases:
HADOOP-17156. Purging the buffers associated with input streams during close() by mukund-thakur
· Pull Request #3285 · apache/hadoop (github.com)
RCA: Scenario that can lead to data corruption:
Driver allocates a bunch of prefetch buffers at init and are shared by different instances of
InputStreams created within that process. These prefetch buffers could be in 3 stages –
* In ReadAheadQueue : request for prefetch logged
* In ProgressList : Work has begun to talk to backend store to get the requested data
* In CompletedList: Prefetch data is now available for consumption.
When multiple InputStreams have prefetch buffers across these states and close is triggered on
any InputStream/s, the commit above will remove buffers allotted to respective stream from all
the 3 lists and also declare that the buffers are available for new prefetches to happen, but
no action to cancel/prevent buffer from being updated with ongoing network request is done.
Data corruption can happen if one such freed up buffer from InProgressList is allotted to a new
prefetch request and then the buffer got filled up with the previous stream’s network request.
Mitigation: If this change is present in any release, kindly help communicate to your customers
to immediately set below config to 0 in their clusters. This will disable prefetches which can
have an impact on perf but will prevent the possibility of data corruption.
fs.azure.readaheadqueue.depth: Sets the readahead queue depth in AbfsInputStream. In case the
set value is negative the read ahead queue depth will be set as
Runtime.getRuntime().availableProcessors(). By default the value will be 2. To disable
readaheads, set this value to 0. If your workload is doing only random reads (non-sequential)
or you are seeing throttling, you may try setting this value to 0.
Next steps: We are getting help to post the notifications for this in Apache groups. Work on
HotFix is also ongoing. Will update this thread once the change is checked in.
Please reach out for any queries or clarifications.