[HADOOP-18521] ABFS ReadBufferManager buffer sharing across concurrent HTTP requests - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 3.3.2, 3.3.3, 3.3.4
Fix Version/s: 3.3.5
Component/s: fs/azure
Labels:
- pull-request-available

Target Version/s:

3.3.9

Description

AbfsInputStream.close() can trigger the return of buffers used for active prefetch GET requests into the ReadBufferManager free buffer pool.

A subsequent prefetch by a different stream in the same process may acquire this same buffer. This can lead to risk of corruption of its own prefetched data, data which may then be returned to that other thread.

The full analysis in in the document attached to this JIRA.

The issue is fixed in Hadoop 3.3.5

Emergency fix through site configuration

On releases without the fix for this (3.3.2-3.3.4), the bug can be avoided by disabling all prefetching

fs.azure.readaheadqueue.depth = 0

Automated probes for risk of exposure

The cloudstore diagnostics JAR has a command safeprefetch which probes an abfs client for being vulnerable. It does this through PathCapabilities.hasPathCapability() probes. It can be invoked on the command line to validate the version/configuration

Consult the source to see how to do this programmatically.

Note also that the tool's mkcsv command can be used to generate the multi-GB CSV files needed to trigger the condition and so verify that the issue exists.

Microsoft Announcement

From: Sneha Vijayarajan
Subject: RE: Alert ! ABFS Driver - Possible data corruption on read path

Hi,

One of the contributions made to ABFS Driver has a potential to cause data corruption on read
path.

Please check if the below change is part of any of your releases:

HADOOP-17156. Purging the buffers associated with input streams during close() by mukund-thakur
· Pull Request #3285 · apache/hadoop (github.com)

RCA: Scenario that can lead to data corruption:

Driver allocates a bunch of prefetch buffers at init and are shared by different instances of
InputStreams created within that process. These prefetch buffers could be in 3 stages –

* In ReadAheadQueue : request for prefetch logged
* In ProgressList : Work has begun to talk to backend store to get the requested data
* In CompletedList: Prefetch data is now available for consumption.

When multiple InputStreams have prefetch buffers across these states and close is triggered on
any InputStream/s, the commit above will remove buffers allotted to respective stream from all
the 3 lists and also declare that the buffers are available for new prefetches to happen, but
no action to cancel/prevent buffer from being updated with ongoing network request is done.
Data corruption can happen if one such freed up buffer from InProgressList is allotted to a new
prefetch request and then the buffer got filled up with the previous stream’s network request.

Mitigation: If this change is present in any release, kindly help communicate to your customers
to immediately set below config to 0 in their clusters. This will disable prefetches which can
have an impact on perf but will prevent the possibility of data corruption.

fs.azure.readaheadqueue.depth: Sets the readahead queue depth in AbfsInputStream. In case the
set value is negative the read ahead queue depth will be set as
Runtime.getRuntime().availableProcessors(). By default the value will be 2. To disable
readaheads, set this value to 0. If your workload is doing only random reads (non-sequential)
or you are seeing throttling, you may try setting this value to 0.

Next steps: We are getting help to post the notifications for this in Apache groups. Work on
HotFix is also ongoing. Will update this thread once the change is checked in.

Please reach out for any queries or clarifications.

Thanks,
Sneha Vijayarajan

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

validating-csv-record-io.sc
13/Feb/23 14:02
16 kB
Steve Loughran
HADOOP-18521 ABFS ReadBufferManager buffer sharing across concurrent HTTP requests.pdf
19/Dec/22 11:18
123 kB
Steve Loughran

Issue Links

is caused by

HADOOP-17156 Clear abfs readahead requests on stream close

Resolved

is related to

HADOOP-18528 Disable abfs prefetching by default

Resolved

HADOOP-18517 ABFS: Add fs.azure.enable.readahead option to disable readahead

Resolved

links to

GitHub Pull Request #5117

GitHub Pull Request #5133

Sub-Tasks

1.	disable purging list of in progress reads in abfs stream closed		Resolved	Pranav Saxena
2.	ABFS: add probes of readahead fix		Resolved	Steve Loughran

Activity

People

Assignee:: Steve Loughran

Reporter:: Steve Loughran

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Due:: 06/Nov/22

Created:: 05/Nov/22 15:00

Updated:: 26/Jun/23 10:17

Resolved:: 19/Dec/22 11:11