[HADOOP-13403] AzureNativeFileSystem rename/delete performance improvements - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.7.2
Fix Version/s: 2.9.0, 3.0.0-alpha1
Component/s: fs/azure
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
WASB has added an optional capability to execute certain FileSystem operations in parallel on multiple threads for improved performance. Please refer to the Azure Blob Storage documentation page for more information on how to enable and control the feature.

Show
WASB has added an optional capability to execute certain FileSystem operations in parallel on multiple threads for improved performance. Please refer to the Azure Blob Storage documentation page for more information on how to enable and control the feature.
Flags:

Patch

Description

WASB Performance Improvements

Problem
-----------
Azure Native File system operations like rename/delete which has large number of directories and/or files in the source directory are experiencing performance issues. Here are possible reasons
a) We first list all files under source directory hierarchically. This is a serial operation.
b) After collecting the entire list of files under a folder, we delete or rename files one by one serially.
c) There is no logging information available for these costly operations even in DEBUG mode leading to difficulty in understanding wasb performance issues.

Proposal
-------------
Step 1: Rename and delete operations will generate a list all files under the source folder. We need to use azure flat listing option to get list with single request to azure store. We have introduced config fs.azure.flatlist.enable to enable this option. The default value is 'false' which means flat listing is disabled.

Step 2: Create thread pool and threads dynamically based on user configuration. These thread pools will be deleted after operation is over. We are introducing introducing two new configs
a) fs.azure.rename.threads : Config to set number of rename threads. Default value is 0 which means no threading.
b) fs.azure.delete.threads: Config to set number of delete threads. Default value is 0 which means no threading.

We have provided debug log information on number of threads not used for the operation which can be useful .

Failure Scenarios:
If we fail to create thread pool due to ANY reason (for example trying create with thread count with large value such as 1000000), we fall back to serialization operation.

Step 3: Bob operations can be done in parallel using multiple threads executing following snippet
while ((currentIndex = fileIndex.getAndIncrement()) < files.length)

{ FileMetadata file = files[currentIndex]; Rename/delete(file); }

The above strategy depends on the fact that all files are stored in a final array and each thread has to determine synchronized next index to do the job. The advantage of this strategy is that even if user configures large number of unusable threads, we always ensure that work doesn’t get serialized due to lagging threads.

We are logging following information which can be useful for tuning number of threads

a) Number of unusable threads
b) Time taken by each thread
c) Number of files processed by each thread
d) Total time taken for the operation

Failure Scenarios:

Failure to queue a thread execute request shouldn’t be an issue if we can ensure at least one thread has completed execution successfully. If we couldn't schedule one thread then we should take serialization path. Exceptions raised while executing threads are still considered regular exceptions and returned to client as operation failed. Exceptions raised while stopping threads and deleting thread pool shouldn't can be ignored if operation all files are done with out any issue.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-13403-001.patch
21/Jul/16 23:58
40 kB
Subramanyam Pattipaka
HADOOP-13403-002.patch
28/Jul/16 07:43
58 kB
Subramanyam Pattipaka
HADOOP-13403-003.patch
30/Jul/16 07:44
66 kB
Subramanyam Pattipaka
HADOOP-13403-004.patch
02/Aug/16 08:24
67 kB
Subramanyam Pattipaka
HADOOP-13403-005.patch
05/Aug/16 20:32
67 kB
Subramanyam Pattipaka
HADOOP-13403-006.patch
08/Aug/16 18:33
67 kB
Subramanyam Pattipaka

Issue Links

breaks

HADOOP-13550 Azure threaded deleter logs too much at info

Patch Available

requires

HADOOP-13459 hadoop-azure runs several test cases repeatedly, causing unnecessarily long running time.

Resolved

Activity

People

Assignee:: Subramanyam Pattipaka

Reporter:: Subramanyam Pattipaka

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 21/Jul/16 23:01

Updated:: 03/Jan/17 11:15

Resolved:: 08/Aug/16 19:34