Details
Description
WASB Driver
WASB driver was developed to support FNS (FlatNameSpace) Azure Storage accounts. FNS accounts do not honor File-Folder syntax. HDFS Folder operations hence are mimicked at client side by WASB driver and certain folder operations like Rename and Delete can lead to lot of IOPs with client-side enumeration and orchestration of rename/delete operation blob by blob. It was not ideal for other APIs too as initial checks for path is a file or folder needs to be done over multiple metadata calls. These led to a degraded performance.
To provide better service to Analytics customers, Microsoft released ADLS Gen2 which are HNS (Hierarchical Namespace) , i.e File-Folder aware store. ABFS driver was designed to overcome the inherent deficiencies of WASB and customers were informed to migrate to ABFS driver.
Customers who still use the legacy WASB driver and the challenges they face
Some of our customers have not migrated to the ABFS driver yet and continue to use the legacy WASB driver with FNS accounts.
These customers face the following challenges:
- They cannot leverage the optimizations and benefits of the ABFS driver.
- They need to deal with the compatibility issues should the files and folders were modified with the legacy WASB driver and the ABFS driver concurrently in a phased transition situation.
- There are differences for supported features for FNS and HNS over ABFS Driver
- In certain cases, they must perform a significant amount of re-work on their workloads to migrate to the ABFS driver, which is available only on HNS enabled accounts in a fully tested and supported scenario.
Deprecation plans for WASB
We are introducing a new feature that will enable the ABFS driver to support FNS accounts (over BlobEndpoint) using the ABFS scheme. This feature will enable customers to use the ABFS driver to interact with data stored in GPv2 (General Purpose v2) storage accounts.
With this feature, the customers who still use the legacy WASB driver will be able to migrate to the ABFS driver without much re-work on their workloads. They will however need to change the URIs from the WASB scheme to the ABFS scheme.
Once ABFS driver has built FNS support capability to migrate WASB customers, WASB driver will be declared deprecated in OSS documentation and marked for removal in next major release. This will remove any ambiguity for new customer onboards as there will be only one Microsoft driver for Azure Storage and migrating customers will get SLA bound support for driver and service, which was not guaranteed over WASB.
We anticipate that this feature will serve as a stepping stone for customers to move to HNS enabled accounts with the ABFS driver, which is our recommended stack for big data analytics on ADLS Gen2.
Any Impact for existing customers who are using ADLS Gen2 (HNS enabled account) with ABFS driver ?
This feature does not impact the existing customers who are using ADLS Gen2 (HNS enabled account) with ABFS driver.
They do not need to make any changes to their workloads or configurations. They will still enjoy the benefits of HNS, such as atomic operations, fine-grained access control, scalability, and performance.
Official recommendation
Microsoft continues to recommend all Big Data and Analytics customers to use Azure Data Lake Gen2 (ADLS Gen2) using the ABFS driver and will continue to optimize this scenario in future, we believe that this new option will help all those customers to transition to a supported scenario immediately, while they plan to ultimately move to ADLS Gen2 (HNS enabled account).
New Authentication options that a WASB to ABFS Driver migrating customer will get
Below auth types that WASB provides will continue to work on the new FNS over ABFS Driver over configuration that accepts these SAS types (similar to WASB)
- SharedKey
- Account SAS
- Service/Container SAS
Below authentication types that were not supported by WASB driver but supported by ABFS driver will continue to be available for new FNS over ABFS Driver
- OAuth 2.0 Client Credentials
- OAuth 2.0: Refresh Token
- Azure Managed Identity
- Custom OAuth 2.0 Token Provider
ABFS Driver SAS Token Provider plugin present today for UserDelegation SAS and Directly SAS will continue to work only for HNS accounts.
Attachments
Issue Links
- is a parent of
-
HADOOP-19179 ABFS: Support FNS Accounts over BlobEndpoint
- Open
- links to