[HUDI-5092] Querying Hudi table throws NoSuchMethodError in Databricks runtime - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Reopened
Priority: Blocker
Resolution: Unresolved
Affects Version/s: 0.12.0
Fix Version/s: 1.1.0
Component/s: spark
Labels:
None

Description

Originally reported by the user:
https://github.com/apache/hudi/issues/6137

Crux of the issue is that Databricks's DBR runtime diverges from OSS Spark, and in that case `FileStatusCache` API is very clearly divergent b/w the two.

There are a few approaches we can take:

Avoid reliance on Spark's FIleStatusCache implementation altogether and rely on our own one
Apply more staggered approach where we first try to use Spark's FileStatusCache and if it doesn't match expected API, we fallback to our own impl

Approach # 1 would actually mean that we're not sharing cache implementation w/ Spark, which in turn would entail that in some cases we might be keeping 2 instances of the same cache. Approach # 2 remediates that and allows us to only fallback in case API is not compatible.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

image.png
24/Jan/23 20:02
83 kB
Ethan Guo
image (1).png
24/Jan/23 20:02
87 kB
Ethan Guo

Issue Links

is duplicated by

HUDI-1368 Merge On Read Snapshot Reader not working for Databricks on ADLS Gen2

Closed

relates to

HUDI-5609 Hudi table not queryable by SQL on Databricks Spark

Open

HUDI-5104 Add feature flag to disable HoodieFileIndex and fall back to HoodieROTablePathFilter

Closed

Activity

People

Assignee:: Ethan Guo

Reporter:: Ethan Guo

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 25/Oct/22 16:56

Updated:: 21/Dec/23 15:27